Re: logical decoding / rewrite map vs. maxAllocatedDescs

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding / rewrite map vs. maxAllocatedDescs
Date: 2018-08-14 14:05:29
Message-ID: 9552a9ef-250d-c7bf-abca-0c0533215fee@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/14/2018 01:49 PM, Tomas Vondra wrote:
> On 08/13/2018 04:49 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2018-08-13 11:46:30 -0300, Alvaro Herrera wrote:
>>> On 2018-Aug-11, Tomas Vondra wrote:
>>>
>>>> Hmmm, it's difficult to compare "bt full" output, but my backtraces
>>>> look
>>>> somewhat different (and all the backtraces I'm seeing are 100% exactly
>>>> the same). Attached for comparison.
>>>
>>> Hmm, looks similar enough to me -- at the bottom you have the executor
>>> doing its thing, then an AcceptInvalidationMessages in the middle
>>> section atop which sit a few more catalog accesses, and further up from
>>> that you have another AcceptInvalidationMessages with more catalog
>>> accesses.  AFAICS that's pretty much the same thing Andres was
>>> describing.
>>
>> It's somewhat different because it doesn't seem to involve a reload of a
>> nailed table, which my traces did.  I wasn't able to reproduce the crash
>> more than once, so I'm not at all sure how to properly verify the issue.
>> I'd appreciate if Thomas could try to do so again with the small patch I
>> provided.
>>
>
> I'll try in the evening. I've tried reproducing it on my laptop, but I
> can't make that happen for some reason - I know I've seen some crashes
> here, but all the reproducers were from the workstation I have at home.
>
> I wonder if there's some subtle difference between the two boxes, making
> it more likely on one of them ... the whole environment (distribution,
> packages, compiler, ...) should be exactly the same, though. The only
> thing I can think of is different CPU speed, possibly making some race
> conditions more/less likely. No idea.
>

I take that back - I can reproduce the crashes, both with and without
the patch, all the way back to 9.6. Attached is a bunch of backtraces
from various versions. There's a bit of variability depending on which
pgbench script gets started first (insert.sql or vacuum.sql) - in one
case (when vacuum is started before insert) the crash happens in
InitPostgres/RelationCacheInitializePhase3, otherwise it happens in
exec_simple_query.

Another observation is that the failing COPY is not necessary, I can
reproduce the crashes without this (so even with wal_level=replica).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
crash-10.log.gz application/gzip 9.2 KB
crash-11.log.gz application/gzip 9.3 KB
crash-11-2.log.gz application/gzip 13.1 KB
crash-11-3.log.gz application/gzip 10.3 KB
crash-96.log.gz application/gzip 10.1 KB
crash-96-2.log.gz application/gzip 10.1 KB
crash-96-logical.log.gz application/gzip 11.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-08-14 14:38:56 Re: [HACKERS] proposal: schema variables
Previous Message Peter Eisentraut 2018-08-14 13:35:14 Re: Memory leak with CALL to Procedure with COMMIT.