Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Michael Paquier <michael(at)paquier(dot)xyz>, Петър Славов <pet(dot)slavov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Date: 2022-05-28 14:46:40
Message-ID: 4B1BA5BF-37EC-4EE7-AB3B-4C4D95A8059A@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> On 28 May 2022, at 12:02, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> I think you basically need to force some, but not all, of the modifying
> transactions to be open for a bit longer, so that it's more likely that
> there's a chance to prune vs CIC waiting. Might also be helpful to update rows
> multiple times within an xact.
Now I've got 2 different versions of test for master branch. Both fail in 50% of cases on my machine. Both take approximately 4 seconds of wallclock time and 1 second of CPU time.

v3: wait with a fraction of waiting transactions.
This test fails with
0 postgres 0x00000001049ec508 ExceptionalCondition + 124
1 postgres 0x00000001045ea284 heap_page_prune + 2992
2 postgres 0x00000001045e9670 heap_page_prune_opt + 424
3 postgres 0x00000001045e25c0 heapam_index_fetch_tuple + 140
4 postgres 0x0000000100272d60 index_fetch_heap + 104
5 postgres 0x0000000100272e18 index_getnext_slot + 88
6 postgres 0x00000001003bbf4c check_exclusion_or_unique_constraint + 440
7 postgres 0x00000001003bc360 ExecCheckIndexConstraints + 232
8 postgres 0x00000001003ea30c ExecInsert + 1024
9 postgres 0x00000001003e90cc ExecModifyTable + 1536
10 postgres 0x00000001003bd0cc standard_ExecutorRun + 268
11 postgres 0x0000000100542d94 ProcessQuery + 160
12 postgres 0x00000001005423c8 PortalRunMulti + 396
13 postgres 0x0000000100541cfc PortalRun + 476

And reverting d9d0762 does not fix the issue. I'm not sure if I'm observing some other problem here.

v4 of a test not use pg_sleep() and fails with regular amcheck failure. Reverting d9d0762 fixes the test. Unless I execute the test for 1 million transactions, then it fail even with a revert...

I suspect that v3 and v4 triggers different problems.

Best regards, Andrey Borodin.

Attachment Content-Type Size
v4-0001-Add-TAP-test-for-REINDEX-CONCURRENTLY-with-HOT-up.patch application/octet-stream 3.4 KB
v3-0001-Add-TAP-test-for-REINDEX-CONCURRENTLY-with-HOT-up.patch application/octet-stream 3.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2022-05-28 19:34:13 Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Previous Message Andres Freund 2022-05-28 07:02:19 Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY