Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table
Date: 2022-01-29 15:23:49
Message-ID: CAK-MWwReitx16bDQjT+ffAFwkQ_+2hdM+hwMvkRmaM835UxVrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Jan 29, 2022 at 7:43 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Fri, Jan 28, 2022 at 07:00:31PM -0800, Peter Geoghegan wrote:
> > If I had to guess, then I'd guess that this has something to do with
> > orphaned HOT chains, like those we saw in the bug report that led to
> > bugfix commit 18b87b20 (which is in 14.2 but not 14.1). I could easily
> > be wrong about that, so take it with a grain of salt. I find it a
> > little suspicious that we're hearing about a REINDEX CONCURRENTLY
> > problem in Postgres 14, which is much less mature than Postgres 12
> > (where REINDEX CONCURRENTLY first appeared).
>
> Possible, but hard to say without an actual proof. Maxim, if the
> problem is reproducible easily on your end, could you give a try to v12
> and v13 and see if it happens as well there?
> --
> Michael
>

I don't remember such problems during the last year on v13 with the same
workload and the same periodic reindex.
It isn't easily reproduced, table question is 800GB size (almost 2TB with
indexes) and around 1000 rows/s updated (5k in peaks), under such load
chance to hit the error seems around 60% (e.g. more than half reindex
attempts end with broken index).
I have a suitable powerful server for tests, but there is no good way to
simulate production workload (especially reproducible).
I'll see what I can do next week.

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message wliang 2022-01-30 02:47:18 Report a potential bug caused by a improper call to pfree()
Previous Message Michael Paquier 2022-01-29 05:43:08 Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table