Re: Order of operations in SubPostmasterMain()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Order of operations in SubPostmasterMain()
Date: 2016-09-29 20:35:15
Message-ID: 20160929203515.hftq6tslmduosnl2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-09-29 15:46:00 -0400, Tom Lane wrote:
> I noticed that buildfarm member culicidae, which is running an
> EXEC_BACKEND build on Linux, occasionally falls over like this:
>
> FATAL: could not reattach to shared memory (key=6280001, addr=0x7fa9df845000): Invalid argument
>
> That's probably because Andres failed to disable ASLR on that machine,

Intentionally so, FWIW. Want to enable PIE as well soon-ish.

Newer versions of windows / msvc want to enable ASLR by default, and I
think that'll break parallel query bgworkers, because they pass the
entry point around as a pointer. So I don't think we'll be able to duck
this issue for much longer.

> but the exact cause isn't too important right now. What I'm finding
> distressing is that that message doesn't show up on the client side,
> making it look like a postmaster crash:

Indeed.

> The reason we don't see anything is that backend libpq hasn't been
> initialized yet, so it won't send the ereport message to the client.
>
> The original design of SubPostmasterMain() intended that such cases
> would be reported, because it runs BackendInitialize (which calls
> pq_init()) before CreateSharedMemoryAndSemaphores (which is where
> the reattach used to happen --- the comments at postmaster.c 4730ff
> and 4747ff reflect this expectation). We broke it when we added
> the earlier reattach call at lines 4669ff.
>
> We could probably refactor things enough so that we do pq_init()
> before PGSharedMemoryReAttach(). It would be a little bit ugly,
> and it would fractionally increase the chance of a reattach failure
> because pq_init() palloc's a few KB worth of buffers. I'm not quite
> sure if it's worth it; thoughts? In any case the mentioned comments
> are obsolete and need to be moved/rewritten.

Hm. It doesn't really seem necessary to directly allocate that
much. Seems pretty harmless to essentially let it start at 0 bytes (so
we can repalloc, but don't use a lot of memory)?

> Another thing that I'm finding distressing while I look at this is
> the placement of ClosePostmasterPorts(). That needs to happen *early*,
> because as long as the child process is holding open the postmaster side
> of the postmaster death pipe, we are risking unhappiness. Not only is
> it not particularly early, it's after process_shared_preload_libraries()
> which could invoke arbitrarily stupid stuff. Whether or not we do
> anything about moving pq_init(), I'm thinking we'd better move the
> ClosePostmasterPorts() call so that it's done as soon as possible,
> probably right after read_backend_variables() which loads the data
> it needs.

That seems like a good idea. Besides the benefit you mention, it looks
like that'll also reduce duplication.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrian Klaver 2016-09-29 20:55:19 Re: pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Previous Message Tom Lane 2016-09-29 20:32:44 Re: Order of operations in SubPostmasterMain()