Re: Crash in new pgstats code

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <fujii(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Crash in new pgstats code
Date: 2022-04-18 14:50:03
Message-ID: 20220418145003.ni7tl6pmyokvj2ie@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-04-18 22:45:07 +1200, Thomas Munro wrote:
> On Mon, Apr 18, 2022 at 7:19 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > On Sat, Apr 16, 2022 at 02:36:33PM -0700, Andres Freund wrote:
> > > which I haven't seen locally. Looks like we have some race between
> > > startup process and walreceiver? That seems not great. I'm a bit
> > > confused that walreceiver and archiving are both active at the same time
> > > in the first place - that doesn't seem right as things are set up
> > > currently.
> >
> > Yeah, that should be exclusively one or the other, never both.
> > WaitForWALToBecomeAvailable() would be a hot spot when it comes to
> > decide when a WAL receiver should be spawned by the startup process.
> > Except from the recent refactoring of xlog.c or the WAL prefetch work,
> > there has not been many changes in this area lately.
>
> Hmm, well I'm not sure what is happening here and will try to dig
> tomorrow, but one observation from some log scraping is that kestrel
> logged similar output with "could not link file" several times before
> the main prefetching commit (5dc0418). I looked back 3 months on
> kestrel/HEAD and found these:

Kestrel won't go that far back even - I set it up 23 days ago...

I'm formally on vacation till Thursday, I'll try to look at earlier
instances then. Unless it's already figured out :). I failed at
reproducing it locally, despite a fair bit of effort.

The BF really should break out individual tests into their own stage
logs. The recovery-check stage is 13MB and 150k lines by now.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-04-18 14:51:08 Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Previous Message Andres Freund 2022-04-18 14:44:29 Re: Postgres perl module namespace