From: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [BUG] Error in BRIN summarization |
Date: | 2020-08-10 17:30:44 |
Message-ID: | d3498a6d-bd6d-83f3-8823-547339560f49@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 30.07.2020 16:40, Anastasia Lubennikova wrote:
> While testing this fix, Alexander Lakhin spotted another problem.
>
> After a few runs, it will fail with "ERROR: corrupted BRIN index:
> inconsistent range map"
>
> The problem is caused by a race in page locking in
> brinGetTupleForHeapBlock [1]:
>
> (1) bitmapsan locks revmap->rm_currBuf and finds the address of the
> tuple on a regular page "page", then unlocks revmap->rm_currBuf
> (2) in another transaction desummarize locks both revmap->rm_currBuf
> and "page", cleans up the tuple and unlocks both buffers
> (1) bitmapscan locks buffer, containing "page", attempts to access the
> tuple and fails to find it
>
>
> At first, I tried to fix it by holding the lock on revmap->rm_currBuf
> until we locked the regular page, but it causes a deadlock with
> brinsummarize(), It can be easily reproduced with the same test as above.
> Is there any rule about the order of locking revmap and regular pages
> in brin? I haven't found anything in README.
>
> As an alternative, we can leave locks as is and add a recheck, before
> throwing an error.
>
Here are the updated patches for both problems.
1) brin_summarize_fix_REL_12_v2 fixes
"failed to find parent tuple for heap-only tuple at (50661,130) in table
"tbl'"
This patch checks that we only access initialized entries of
root_offsets[] array. If necessary, collect the array again. One recheck
is enough here, since concurrent pruning is not possible.
2) brin_pagelock_fix_REL_12_v1.patch fixes
"ERROR: corrupted BRIN index: inconsistent range map"
This patch adds a recheck as suggested in previous message.
I am not sure if one recheck is enough to eliminate the race completely,
but the problem cannot be reproduced anymore.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
brin_pagelock_fix_REL_12_v1.patch | text/x-patch | 1.5 KB |
brin_summarize_fix_REL_12_v2.patch | text/x-patch | 5.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-08-10 18:50:26 | Re: pendingOps table is not cleared with fsync=off |
Previous Message | legrand legrand | 2020-08-10 16:51:40 | nested queries vs. pg_stat_activity |