Re: logical decoding / rewrite map vs. maxAllocatedDescs

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding / rewrite map vs. maxAllocatedDescs
Date: 2018-08-10 21:59:44
Message-ID: 470adb65-5101-4659-d213-41bde1eef8f2@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/10/2018 11:13 PM, Andres Freund wrote:
> On 2018-08-10 22:57:57 +0200, Tomas Vondra wrote:
>>
>>
>> On 08/09/2018 07:47 PM, Alvaro Herrera wrote:
>>> On 2018-Aug-09, Tomas Vondra wrote:
>>>
>>>> I suppose there are reasons why it's done this way, and admittedly the test
>>>> that happens to trigger this is a bit extreme (essentially running pgbench
>>>> concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's
>>>> extreme enough to deem it not an issue, because people using many temporary
>>>> tables often deal with bloat by doing frequent vacuum full on catalogs.
>>>
>>> Actually, it seems to me that ApplyLogicalMappingFile is just leaking
>>> the file descriptor for no good reason. There's a different
>>> OpenTransientFile call in ReorderBufferRestoreChanges that is not
>>> intended to be closed immediately, but the other one seems a plain bug,
>>> easy enough to fix.
>>>
>>
>> Indeed. Adding a CloseTransientFile to ApplyLogicalMappingFile solves
>> the issue with hitting maxAllocatedDecs. Barring objections I'll commit
>> this shortly.
>
> Yea, that's clearly a bug. I've not seen a patch, so I can't quite
> formally sign off, but it seems fairly obvious.
>
>
>> But while running the tests on this machine, I repeatedly got pgbench
>> failures like this:
>>
>> client 2 aborted in command 0 of script 0; ERROR: could not read block
>> 3 in file "base/16384/24573": read only 0 of 8192 bytes
>>
>> That kinda reminds me the issues we're observing on some buildfarm
>> machines, I wonder if it's the same thing.
>
> Oooh, that's interesting! What's the precise recipe that gets you there?
>

I don't have an exact reproducer - it's kinda rare and unpredictable,
and I'm not sure how much it depends on the environment etc. But I'm
doing this:

1) one cluster with publication (wal_level=logical)

2) one cluster with subscription to (1)

3) simple table, replicated from (1) to (2)

-- publisher
create table t (a serial primary key, b int, c int);
create publication p for table t;

-- subscriber
create table t (a serial primary key, b int, c int);
create subscription s CONNECTION '...' publication p;

4) pgbench inserting rows into the replicated table

pgbench -n -c 4 -T 300 -p 5433 -f insert.sql test

5) pgbench doing vacuum full on pg_class

pgbench -n -f vacuum.sql -T 300 -p 5433 test

And once in a while I see failures like this:

client 0 aborted in command 0 of script 0; ERROR: could not read
block 3 in file "base/16384/86242": read only 0 of 8192 bytes

client 3 aborted in command 0 of script 0; ERROR: could not read
block 3 in file "base/16384/86242": read only 0 of 8192 bytes

client 2 aborted in command 0 of script 0; ERROR: could not read
block 3 in file "base/16384/86242": read only 0 of 8192 bytes

or this:

client 2 aborted in command 0 of script 0; ERROR: could not read
block 3 in file "base/16384/89369": read only 0 of 8192 bytes

client 1 aborted in command 0 of script 0; ERROR: could not read
block 3 in file "base/16384/89369": read only 0 of 8192 bytes

I suspect there's some other ingredient, e.g. some manipulation with the
subscription. Or maybe it's not needed at all and I'm just imagining things.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
vacuum.sql application/sql 21 bytes
insert.sql application/sql 34 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2018-08-10 22:12:07 [sqlsmith] ERROR: partition missing from subplans
Previous Message Andrew Dunstan 2018-08-10 21:55:06 Re: libpq compression