Re: Regression tests fail with musl libc because libpq.so can't be loaded

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Wolfgang Walther <walther(at)technowledgy(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Regression tests fail with musl libc because libpq.so can't be loaded
Date: 2024-03-18 21:17:36
Message-ID: CA+hUKG+Tq3GK7bPd03N0Eox3YY4-Hjd7qQjo_QZFjdbhTqQGQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Mar 19, 2024 at 3:23 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > (Hmm, I think it's not that unreasonable on their part to assume the
> > initial environment is immutable if their implementation doesn't
> > mutate it, and our doing so is undeniably UB; surprising, maybe, given
> > that the technique works on that other popular brand of C library on
> > that kind of kernel, not to mention dozens of old Unixen of yore...
>
> Does their implementation also ignore the effects of putenv() or
> setenv() on LD_LIBRARY_PATH? They have no moral high ground
> whatsoever if that's the case. But if it doesn't, an alternative
> route to a solution could be to scan the original environment, strdup
> and putenv each entry to move it to freshly malloc'd space, and
> then reclaim the old environment area.

Yes, the musl linker/loader ignores putenv()/setenv() changes to
LD_LIBRARY_PATH after process start (that is, changes only effect the
search path when injected into a new program with exec*()). As does
glibc, it's just that it captures by copy instead of reference
(according to one of the links above, I didn't check the source). So
setenv() has no effect on dlopen() in *this* program, and using putenv
in that way won't help. We simply can't move the value of
LD_LIBRARY_PATH (though my patch could be a little sneakier and steal
all the bytes right up to the = sign to get more space for our
message!).

One way to tell if a copy has been made is to trace a program that does:

getenv("LD_LIBRARY_PATH")[2] = 'X';
dlopen("foo.so", RTLD_NOW | RTLD_GLOBAL);

... when run with LD_LIBRARY_PATH set to /asdf. On FreeBSD I see it
tries to open "/aXdf...", so now I know that FreeBSD also captures it
by reference like musl. But we don't use the clobber trick on
FreeBSD, it has a proper setproctitle() function that knows how to
negotiate with the kernel, so it doesn't matter. It also ignores
changes made with setent()/putenv(), because those create fresh
entries but leave the initial environment strings untouched.

Solaris also ignores changes made after startup (it's in the dlopen
man page), and from a very quick look at its ld_lib_setup() I think it
achieved that with a copy. I believe its ancestor SunOS 4 invented
all of these conventions (and the mmap/virtual memory concepts they
rode in on), later nailed down to some degree in the System V ABI and
very widely adopted, but I don't see anything in the latter that
specifically addresses this point, eg LD_LIBRARY copy vs reference and
interaction with dlopen() (perhaps I didn't look hard enough). I'm
not sure what else you can point to to make strong claims about this
stuff, but I bet every system ignores changes after startup, it's just
that they found two ways to achieve that. POSIX says of dlopen that
the "file [argument] is used in an implementation-defined manner", and
of environ that we're welcome to swap a whole new environ, but doesn't
seem to tell us anything about the one that is replaced (who owns it?
is the initial one set up at execution time special? etc). The line
banning manipulation of the pointers environ refers to doesn't exactly
describe what we're doing (we're manipulating the strings pointed to
by the *previous* environ). UB.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Rahul Uniyal 2024-03-18 21:33:40 Re: Postgres jdbc driver inconsistent behaviour with double precession
Previous Message Dave Cramer 2024-03-18 20:18:08 Re: Postgres jdbc driver inconsistent behaviour with double precession

Browse pgsql-hackers by date

  From Date Subject
Next Message Amonson, Paul D 2024-03-18 21:22:43 RE: Popcount optimization using AVX512
Previous Message Tom Lane 2024-03-18 21:10:00 Re: Improving EXPLAIN's display of SubPlan nodes