Re: "could not reattach to shared memory" on buildfarm member dory

From: Noah Misch <noah(at)leadboat(dot)com>
To: Heath Lord <heath(dot)lord(at)crunchydata(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: "could not reattach to shared memory" on buildfarm member dory
Date: 2018-12-03 05:35:06
Message-ID: 20181203053506.GB2860387@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 25, 2018 at 08:05:12AM -0700, Noah Misch wrote:
> On Mon, Sep 24, 2018 at 01:53:05PM -0400, Tom Lane wrote:
> > Overall, I agree that neither of these approaches are exactly attractive.
> > We're paying a heck of a lot of performance or complexity to solve a
> > problem that shouldn't even be there, and that we don't understand well.
> > In particular, the theory that some privileged code is injecting a thread
> > into every new process doesn't square with my results at
> > https://www.postgresql.org/message-id/15345.1525145612%40sss.pgh.pa.us
> >
> > I think our best course of action at this point is to do nothing until
> > we have a clearer understanding of what's actually happening on dory.
> > Perhaps such understanding will yield an idea for a less painful fix.
>
> I see.

Could one of you having a dory login use
https://live.sysinternals.com/Procmon.exe to capture process events during
backend startup? The ideal would be one capture where startup failed reattach
and another where it succeeded, but having the successful run alone would be a
good start. The procedure is roughly this:

- Install PostgreSQL w/ debug symbols.
- Start a postmaster.
- procmon /nomonitor
- procmon "Filter" menu -> Enable Advanced Output
- Ctrl-l, add filter for "Process Name" is "postgres.exe"
- Ctrl-e (starts collecting data)
- psql (leave it running)
- After ~60s, Ctrl-e again in procmon (stops collecting data)
- File -> Save -> PML
- File -> Save -> XML, include stack traces, resolve stack symbols
- Compress the PML and XML files, and mail them here

I'm attaching the data from a system not having the problem. On this system,
backend startup sees six thread creations:

1. main thread
2. thread created before postgres.exe has control
3. thread created before postgres.exe has control
4. thread created before postgres.exe has control
5. in pgwin32_signal_initialize()
6. in src\backend\port\win32\timer.c:setitimer()

Threads 2-4 exit exactly 30s after creation. If we fail to reattach to shared
memory, we'll exit before reaching code to start 5 or 6. It would be quite
interesting if dory makes a different number of threads or if threads 2-4 live
some duration other than 30s. It would also be interesting if dory has "Load
Image" events after postgres.exe code has started running. This unaffected
system loads mswsock.dll during read_inheritable_socket().

Thanks,
nm

Attachment Content-Type Size
unaffected-gustnado.XML.xz application/octet-stream 91.2 KB
unaffected-gustnado.PML.xz application/octet-stream 122.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-12-03 05:43:04 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Tatsuo Ishii 2018-12-03 05:33:15 Re: pgbench doc fix