Re: BUG #17245: Index corruption involving deduplicated entries

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, David Rowley <dgrowley(at)gmail(dot)com>, Herman verschooten <Herman(at)verschooten(dot)ne>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Herman verschooten <Herman(at)verschooten(dot)net>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-10-28 09:49:57
Message-ID: CAApHDvoeu9rtEwdawqeiFtyMaDbaTgEHu1Ns-2YSo+A+_5v5GA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, 28 Oct 2021 at 14:36, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I'll take another guess: I wonder if commit 56788d21 ("Allocate
> consecutive blocks during parallel seqscans") is somehow causing
> parallel CREATE INDEX to produce wrong results. The author (David
> Rowley) is CC'd. Does a bug in that commit seem like it might explain
> this problem, David? Might parallel workers in a parallel index build
> somehow become confused about which worker is supposed to scan which
> heap block, leading to duplicate TIDs in the final index?

I stared at that code for a while and didn't really see how it could
be broken, unless if the atomics implementation on that machine were
broken. Thomas and I had a look at that earlier and on his FreeBSD
machine, it uses the arch-x64.h implementation of
pg_atomic_fetch_add_u64_impl().

I also noted that pg_atomic_fetch_add_u64() is not really getting much
use. regress.c is the only other location, however, I really doubt
that this is the issue here.

Just to see what it would look like if this was broken, I went and
mocked up such a bug by adding the following code just above "return
page;" at then of table_block_parallelscan_nextpage:

if (page == 1000)
page = 999;

I then did:

create table b (b int not null, t text not null);
insert into b select x,x::text from generate_series(1,200000)x;
set max_parallel_workers_per_gather=0;
select sum(b), sum(length(t)) from b;
set max_parallel_workers_per_gather=2;
select sum(b), sum(length(t)) from b;

I noted that I get different results between the parallel and
non-parallel query due to page 999 being read twice. I then created
the following index:

set max_parallel_maintenance_workers = 2;
create index on b(t);

If I run a query to perform an index scan to find any value of "t"
that's on page 999, then I get:

postgres=# select ctid,* from b where t = '185050';
ctid | b | t
----------+--------+--------
(999,54) | 185050 | 185050
(999,54) | 185050 | 185050
(2 rows)

We just get the same tid twice in the index. That's quite different
from another value of "t" getting into the list of tids for '185050'.

David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-10-28 11:06:07 BUG #17251: Download Link Page Displaying Weirdly
Previous Message PG Bug reporting form 2021-10-28 09:16:34 BUG #17250: libboost_thread.so.1.75.0 is missing when installing PostGIS 3.1