Re: Something fishy happening on frogmouth

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Something fishy happening on frogmouth
Date: 2013-10-30 12:45:03
Message-ID: CA+Tgmob859sdnDuHAF31BE55qEoREnCzzWeqDbgkNB0d_F+zmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 29, 2013 at 3:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The last two buildfarm runs on frogmouth have failed in initdb,
> like this:
>
> creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting dynamic shared memory implementation ... windows
> creating configuration files ... ok
> creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space
> child process exited with exit code 1
>
> It shouldn't be failing like that, considering that we just finished
> probing for acceptable max_connections and shared_buffers without hitting
> any apparent limit. I suppose it's possible that the final shm segment
> size is a bit larger than what was tested at the shared_buffer step,
> but that doesn't seem very likely to be the explanation. What seems
> considerably more probable is that the probe for a shared memory
> implementation is screwing up the system state somehow. It may not be
> unrelated that this machine was happy before commit d2aecae went in.

If I'm reading this correctly, the last three runs on frogmouth have
all failed, and all of them have failed with a complaint about,
specifically, Global/PostgreSQL.851401618. Now, that really shouldn't
be happening, because the code to choose that number looks like this:

dsm_control_handle = random();

One possibility that occurs to me is that if, for some reason, we're
using the same handle every time on Windows, and if Windows takes a
bit of time to reclaim the segment after the postmaster exits (which
is not hard to believe given some previous Windows behavior I've
seen), then running the postmaster lots of times in quick succession
(as initdb does) might fail. I dunno what that has to do with the
patch, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2013-10-30 12:45:29 How can I build OSSP UUID support on Windows to avoid duplicate UUIDs?
Previous Message MauMau 2013-10-30 12:42:36 How can I build OSSP UUID support on Windows to avoid duplicate UUIDs?