Re: Re: windows 8 RTM compatibility issue (could not reserve shared memory region for child)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Dave Vitek <dvitek(at)grammatech(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, v-seishi(at)microsoft(dot)com
Subject: Re: Re: windows 8 RTM compatibility issue (could not reserve shared memory region for child)
Date: 2015-06-24 07:03:53
Message-ID: CAB7nPqSi9WaP1AycLxc8K1szGFTHcNzEuJeCY2bNzCOhddzMbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jun 24, 2015 at 12:29 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Wed, Jun 24, 2015 at 10:06:00AM +0900, Michael Paquier wrote:
>
>> > On Tue, Sep 04, 2012 at 11:45:47PM -0400, Dave Vitek wrote:
>> >> LOG: could not reserve shared memory region (addr=0000000001410000) for child
>> >> 0000000000000F8C: 487
>> >> LOG: could not fork new process for connection: A blocking operation was
>> >> interrupted by a call to WSACancelBlockingCall.
>
>> So, it happens that it is still possible to hit this issue on at least
>> Win2k12 boxes (received some complaints about that) even if
>> RandomizedBaseAddress is disabled in build, as per a result of the
>> following thread:
>> http://www.postgresql.org/message-id/BD0D89EC2438455C9DE0DC94D36912F4@maumau
>
> That report led to the RandomizedBaseAddress="FALSE" commit, so the report was
> not based on such a build. If you have received complaints definitively
> involving a RandomizedBaseAddress="FALSE" build, that is novel evidence. If
> these complaints involved publicly-available binaries, which exact binaries
> (download URL)? If not, what do you know about how the binaries were built?

They are not publicly available, but the build is done using the
community perl scripts with Visual 2008, with a slight difference
though in the VC spec file, AdditionalOptions includes /DLL to tell
the linker to build DLLs, but I don't think that it is much related to
the failure except if I am missing a crucial piece of information
regarding Visual.

By the way, the failure is too similar to the one of this thread and
the one of MauMau
(http://www.postgresql.org/message-id/BD0D89EC2438455C9DE0DC94D36912F4@maumau)
2015-06-23 13:00:24.989 PDT 55898d60.1388 0 LOG: could not reserve
shared memory region (addr=00000000013C0000) for child
0000000000001868: error code 487
2015-06-23 13:00:24.989 PDT 55898d60.1388 0 LOG: could not fork
autovacuum worker process: A blocking operation was interrupted by a
call to WSACancelBlockingCall.

This happens periodically at a rhythm of 10~20 minutes, most of the
time with autovacuum, and sometimes impacting with child backend,
leading to connection failures. But I am assuming that we get higher
chances to hit this failure with a high number of concurrent
connections, and a high amount of memory used by the system.

>> I am wondering if Perhaps we could do better than what we have now
>> with a retry logic in the thread fork loop as it seems like a stopover
>> to use a non-NULL lpBaseAddress in MapViewOfFileEx to make the address
>> selection more random as this base address selection would be
>> system-dependent.
>
> I don't understand exactly what you're proposing here. Are you proposing to
> retry backend creation after a child can't reattach to shared memory?

Yes, in the context of Windows to alleviate the failure for
applications impacted by that as disabling ASLR does not seem enough
for some contexts.

> That is
> better than a user-facing failure, but let's start with a diligent attempt to
> root-cause the complaints you have received.

Sure.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message bceccarelli 2015-06-24 13:59:58 BUG #13467: Latest Openssl library forces Postgres to Close Connections
Previous Message digoal 2015-06-24 05:58:50 BUG #13465: multi update query use CTE, result & plan not equal, BUG?