#12 Python pyc file serialization is architecture-specific
Opened 3 months ago by zbyszek. Modified a month ago

The issue occurs when we have a noarch package that contains a .pyc file. The result of the serialization on different architectures is "functionally equivalent" but not bit-for-bit identical.

Example: meson-1.3.2-1.fc41, https://bugzilla.redhat.com/show_bug.cgi?id=2266767.
See: https://bugzilla.redhat.com/show_bug.cgi?id=1686078#c2,
https://docs.fedoraproject.org/en-US/packaging-guidelines/Python_Appendix/#_byte_compilation_reproducibility

It's not clear what we should do here… The nicest solution would be to make python byte serialization arch-independent. Another solution would be to apply the marshalparser fixup automatically in package builds. Yet another solution would be to acknowledge the issue and do python package rebuilds always on the original architecture. I'm creating this issue because I expect that this will come up quite a bit with various python packages.


How does Debian deal with it?

They don't do byte-compilation at build time. It gets done at install type using py3compile.

(To be clear, byte-compilation at install time is not a good idea, but that's what they do, so they don't hit this problem.)

I should update the status here:
https://github.com/keszybz/add-determinism/blob/main/src/handlers/pyc.rs
implements a cleaner. I think think this is good enough for us for now.

Ideally, long term, Python itself would be fixed to write pyc files that are deterministic. This would also help in other context where people use them.

Wow - that sounds like great news and thank you both for these lightning-fast updates.

I assumed pyc irreproducibility was a solved problem because Debian's percentage of "reproducible builds" looks pretty good (from the outside).

There are a bunch of things like this that Debian has historically done that makes it so their reproducibility isn't quite as it is made out to be.

Yeah, there's quite a bit of difference between distributions in little details like this. We have a similar story with .gdb_index: we add this in a post-processing phase to make debugging nicer and we were hit by https://bugzilla.redhat.com/show_bug.cgi?id=2232086. AFAIK, it hasn't come up earlier in other distributions, but that's because we actually provide usable debuginfo data.

Login to comment on this ticket.

Metadata