Re: Regression tests fail with musl libc because libpq.so can't be loaded

From: Wolfgang Walther <walther(at)technowledgy(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Regression tests fail with musl libc because libpq.so can't be loaded
Date: 2024-03-21 20:16:46
Message-ID: f98cd8de-1c66-491a-8409-e62c09932080@technowledgy.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thomas Munro:
> Of course we have to distinguish between the basic argv[] clobbering
> trick which is barely even a trick, and the more advanced environ
> stealing trick which confuses musl.

Right. The latter not only confuses musl, but also makes
/proc/<pid>/environ return garbage. This is also mentioned at the bottom
of main.c, which has a workaround for the specific case of UBSan
depending on that. This is kind of funny: Because we are relying on
undefined behavior regarding the modification of environ, we need a
workaround for the "UndefinedBehaviorSanitizer" - I guess by failing
without this workaround, it wanted to tell us something..

This happens on glibc, too.

So summarizing:

1. The simple approach is to use PS_USE_CLOBBER_ARGV on Linux only for
glibc and other known-to-be-good-and-identifiable libc variants,
otherwise default to PS_USE_NONE. This will not only keep the problem
for /proc/../environ for glibc users, but also disable ps status for
musl entirely. Considering that probably the biggest use-case for musl
is to run postgres in containers, it's quite likely to actually run more
than just one cluster on a single machine. In this case... ps status
would be especially handy to identify which cluster a process belongs to.

2. The next proposal was to stop clobbering environ once LD_LIBRARY_PATH
/ LD_PRELOAD is found to keep those intact. This will keep ps status
support on musl, which is good. But the /proc/.../environ problem will
still be there, unchanged.

Both of those approaches rely on the undefined behavior of clobbering
environ.

3. The logical consequence of this is, to stop clobbering environ and
use only the available argv space. However, this will quickly leave us
with a very small ps status buffer to work with, making the feature less
useful. Note, that this could happen theoretically by starting postgres
with the fewest arguments and environment possible, too. Not sure what
the minimal buffer size is that could be achieved that way. The point
is: The buffer size is not guaranteed at all.

4. The upstream (musl) suggestion of which I sent a PoC was to "exec
yourself with a bigger argv". This works. I chose to pad argv0 with
trailing slashes. Those can safely be stripped away again, because any
argv0 which would come with a trailing slash to start with, would not be
the current executable, but a directory - so would fail exec immediately
anyway. This keeps /proc/.../environ intact and does not rely on
undefined behavior. Additionally, we get a guaranteed ps buffer size of
256, which is what we use on BSDs and Windows, too.

I wonder why we actually fall back to PS_USE_NONE by default.. and how
much of that is related to the environment clobbering to start with?
Could we even use the exec-approach as the fallback in all other cases
except BSDs and Windows and get rid of PS_USE_NONE? Clobbering only argv
sure seems way safer to do than what we do right now.

Best,

Wolfgang

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message walther 2024-03-21 20:30:00 Re: Regression tests fail with musl libc because libpq.so can't be loaded
Previous Message Tomas Vondra 2024-03-21 19:27:00 Re: Index plan returns different results to sequential scan

Browse pgsql-hackers by date

  From Date Subject
Next Message walther 2024-03-21 20:30:00 Re: Regression tests fail with musl libc because libpq.so can't be loaded
Previous Message David Christensen 2024-03-21 19:42:24 Re: Avoiding inadvertent debugging mode for pgbench