Re: retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)
Date: 2017-06-06 18:11:06
Message-ID: CA+TgmoY5ZSbc9L_BmpkgNFaz9nE2-vGNjCm=Nut4kkZJ4UzpKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 6, 2017 at 1:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jun 6, 2017 at 12:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> By definition, the address range we're trying to reuse worked successfully
>>> in the postmaster process. I don't see how forcing a specific address
>>> could do anything but create an additional risk of postmaster startup
>>> failure.
>
>> If the postmaster picked an address where other things are unlikely to
>> get loaded, then that would increase the chances of child processes
>> finding it available, wouldn't it?
>
> But how would we know that a particular address range is more unlikely
> than others to have a conflict? (And even if we do know that, what
> happens when there is a conflict anyway?) I sure don't want to be in
> the business of figuring out what to use across all the different Windows
> versions there are, to say nothing of the different antivirus products
> that might be causing the problem.

I'm not quite sure how to respond to this. How do we know any piece
of information about anything, ever? Sometimes we figure it out by
looking for relevant sources using, say, Google, and other times we
determine it by experiment or from first principles. You and Andres
were having a discussion earlier about gathering this exact
information, so apparently you thought it might be possible back then:

https://www.postgresql.org/message-id/4180.1492292046%40sss.pgh.pa.us

Now, that having been said, I agree that no address range is perfectly
safe (and that's why it's good to have retries). I also agree that
this is likely to be heavily platform-dependent, which is why I wrote
DSM the way that I did instead of (as Noah was advocating) trying to
solve the problem of getting a constant mapping across all processes
in a parallel group. But since nobody's keen on the idea of trying to
tolerate having the main shared memory segment at different addresses
in different processes, we'll have to come up with some other solution
for that case. If the retry thing doesn't plug the hole adequately,
trying to put the initial allocation in some place that's less likely
to induce conflicts seems like the next thing to try.

> Also, the big picture here is that we ought to be working towards allowing
> our Windows builds to use ASLR; our inability to support that is not
> something to be proud of in 2017. No predetermined-address scheme is
> likely to be helpful for that.

Sure, retrying is better for that as well, as I already said upthread,
but that doesn't mean that putting the initial mapping someplace less
likely to conflict couldn't reduce the need for retries.

The even-bigger picture here is that both this issue and the need for
DSA and DSM are due to the fact that we've picked an unpopular
programming model with poor operating system support. If we switch
from using processes to using threads, we don't have to deal with all
of this nonsense any more, and we'd solve some other problems, too -
e.g. at least on Windows, I think backend startup would get quite a
bit faster. Obviously, anybody else who is using processes + shared
memory is going to run into the same problems we're hitting, and if
operating system manufacturers wanted to make this kind of programming
easy, they could do it. We're expending a lot of effort here because
we're swimming against the current.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-06-06 18:13:29 Re: Should we standardize on a type for signal handler flags?
Previous Message Regina Obe 2017-06-06 17:52:45 Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg