Re: subscriptionCheck failures on nightjar

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>,Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>,Michael Paquier <michael(at)paquier(dot)xyz>,Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>,Robert Haas <robertmhaas(at)gmail(dot)com>,Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 22:11:22
Message-ID: CE2CE820-8A48-4C0D-B06A-9BD3B1E26D43@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On September 20, 2019 3:06:20 PM PDT, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>On 2019-Sep-20, Tom Lane wrote:
>
>> Actually, what I did was as attached [1], and I am getting traces
>like
>> [2]. The problem seems to occur only when there are two or three
>> processes concurrently creating the same snapshot file. It's not
>> obvious from the debug trace, but the snapshot file *does* exist
>> after the music stops.
>
>Uh .. I didn't think it was possible that we would build the same
>snapshot file more than once. Isn't that a waste of time anyway?
>Maybe
>we can fix the symptom by just not doing that in the first place?
>I don't have a strategy to do that, but seems worth considering before
>retiring the bf animals.

We try to avoid it, but the check is racy. Check comments in SnapBuildSerialize. We could introduce locking etc to avoid that, but that seems overkill, given that were really just dealing with a broken os.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-09-20 22:16:53 Re: WAL recycled despite logical replication slot
Previous Message Alvaro Herrera 2019-09-20 22:06:20 Re: subscriptionCheck failures on nightjar