Re: make dist using git archive

From: Eli Schwartz <eschwartz93(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Tristan Partin <tristan(at)neon(dot)tech>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: make dist using git archive
Date: 2024-01-26 21:18:58
Message-ID: 76516f81-31b0-40db-b30e-2fe9e332895a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, meson developer here.

On 1/23/24 4:30 AM, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
>> I am not really following why we can't use the builtin Meson dist
>> command. The only difference from my testing is it doesn't use a
>> --prefix argument.
>
> Here are some problems I have identified:
>
> 1. meson dist internally runs gzip without the -n option.  That makes
> the tar.gz archive include a timestamp, which in turn makes it not
> reproducible.

Well, it uses python tarfile which uses python gzip support under the
hood, but yes, that is true, python tarfile doesn't expose this tunable.

> 2. Because gzip includes a platform indicator in the archive, the
> produced tar.gz archive is not reproducible across platforms.  (I don't
> know if gzip has an option to avoid that.  git archive uses an internal
> gzip implementation that handles this.)

This appears to be https://github.com/python/cpython/issues/112346

> 3. Meson does not support tar.bz2 archives.

Simple enough to add, but I'm a bit surprised as usually people seem to
want either gzip for portability or xz for efficient compression.

> 4. Meson uses git archive internally, but then unpacks and repacks the
> archive, which loses the ability to use git get-tar-commit-id.

What do you use this for? IMO a more robust way to track the commit used
is to use gitattributes export-subst to write a `.git_archival.txt` file
containing the commit sha1 and other info -- this can be read even after
the file is extracted, which means it can also be used to bake the ID
into the built binaries e.g. as part of --version output.

> 5. I have found that the tar archives created by meson and git archive
> include the files in different orders.  I suspect that the Python
> tarfile module introduces some either randomness or platform dependency.

Different orders is meaningless, the question is whether the order is
internally consistent. Python uses sorted() to guarantee a stable order,
which may be a different algorithm than the one git-archive uses to
guarantee a stable order. But the order should be stable and that is
what matters.

> 6. meson dist is also slower because of the additional work.

I'm amenable to skipping the extraction/recombination of subprojects and
running of dist scripts in the event that neither exist, as Tristan
offered to do, but...

> 7. meson dist produces .sha256sum files but we have called them .sha256.
>  (This is obviously trivial, but it is something that would need to be
> dealt with somehow nonetheless.)
>
> Most or all of these issues are fixable, either upstream in Meson or by
> adjusting our own requirements.  But for now this route would have some
> significant disadvantages.

Overall I feel like much of this is about requiring dist tarballs to be
byte-identical to other dist tarballs, although reproducible builds is
mainly about artifacts, not sources, and for sources it doesn't
generally matter unless the sources are ephemeral and generated
on-demand (in which case it is indeed very important to produce the same
tarball each time). A tarball is usually generated once, signed, and
uploaded to release hosting. Meson already guarantees the contents are
strictly based on the built tag.

--
Eli Schwartz

Attachment Content-Type Size
OpenPGP_0x84818A6819AF4A9B.asc application/pgp-keys 17.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-01-26 21:40:29 Bug: The "directory" control parameter does not work
Previous Message Tom Lane 2024-01-26 21:09:00 Re: A performance issue with Memoize