From: | Eli Schwartz <eschwartz93(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Tristan Partin <tristan(at)neon(dot)tech> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: make dist using git archive |
Date: | 2024-01-26 21:18:58 |
Message-ID: | 76516f81-31b0-40db-b30e-2fe9e332895a@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, meson developer here.
On 1/23/24 4:30 AM, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
>> I am not really following why we can't use the builtin Meson dist
>> command. The only difference from my testing is it doesn't use a
>> --prefix argument.
>
> Here are some problems I have identified:
>
> 1. meson dist internally runs gzip without the -n option. That makes
> the tar.gz archive include a timestamp, which in turn makes it not
> reproducible.
Well, it uses python tarfile which uses python gzip support under the
hood, but yes, that is true, python tarfile doesn't expose this tunable.
> 2. Because gzip includes a platform indicator in the archive, the
> produced tar.gz archive is not reproducible across platforms. (I don't
> know if gzip has an option to avoid that. git archive uses an internal
> gzip implementation that handles this.)
This appears to be https://github.com/python/cpython/issues/112346
> 3. Meson does not support tar.bz2 archives.
Simple enough to add, but I'm a bit surprised as usually people seem to
want either gzip for portability or xz for efficient compression.
> 4. Meson uses git archive internally, but then unpacks and repacks the
> archive, which loses the ability to use git get-tar-commit-id.
What do you use this for? IMO a more robust way to track the commit used
is to use gitattributes export-subst to write a `.git_archival.txt` file
containing the commit sha1 and other info -- this can be read even after
the file is extracted, which means it can also be used to bake the ID
into the built binaries e.g. as part of --version output.
> 5. I have found that the tar archives created by meson and git archive
> include the files in different orders. I suspect that the Python
> tarfile module introduces some either randomness or platform dependency.
Different orders is meaningless, the question is whether the order is
internally consistent. Python uses sorted() to guarantee a stable order,
which may be a different algorithm than the one git-archive uses to
guarantee a stable order. But the order should be stable and that is
what matters.
> 6. meson dist is also slower because of the additional work.
I'm amenable to skipping the extraction/recombination of subprojects and
running of dist scripts in the event that neither exist, as Tristan
offered to do, but...
> 7. meson dist produces .sha256sum files but we have called them .sha256.
> (This is obviously trivial, but it is something that would need to be
> dealt with somehow nonetheless.)
>
> Most or all of these issues are fixable, either upstream in Meson or by
> adjusting our own requirements. But for now this route would have some
> significant disadvantages.
Overall I feel like much of this is about requiring dist tarballs to be
byte-identical to other dist tarballs, although reproducible builds is
mainly about artifacts, not sources, and for sources it doesn't
generally matter unless the sources are ephemeral and generated
on-demand (in which case it is indeed very important to produce the same
tarball each time). A tarball is usually generated once, signed, and
uploaded to release hosting. Meson already guarantees the contents are
strictly based on the built tag.
--
Eli Schwartz
Attachment | Content-Type | Size |
---|---|---|
OpenPGP_0x84818A6819AF4A9B.asc | application/pgp-keys | 17.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David E. Wheeler | 2024-01-26 21:40:29 | Bug: The "directory" control parameter does not work |
Previous Message | Tom Lane | 2024-01-26 21:09:00 | Re: A performance issue with Memoize |