Re: BUG #17245: Index corruption involving deduplicated entries

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, David Rowley <dgrowley(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-10-28 22:48:31
Message-ID: 20211028224831.bj7ew3j74tw4cmvh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-10-28 15:23:38 -0700, Peter Geoghegan wrote:
> Anything is possible. But Kamigishi Rei has said that this database
> has never had a hard crash or unclean shut down, which I definitely
> believe. Also, they are using ECC on a Xeon processor. This is the
> kind of hardware that is generally assumed to be very reliable.

That wouldn't protect against e.g. a logic bug in ZFS. Given its copy-on-write
nature corruption could very well manifest as seeing an older version of the
data when re-reading data from disk. Which could very well lead to the type of
corruption we're seeing here.

A few years back I tried to help somebody investigate corruption that turned
out to be caused by something roughly along those lines (IIRC several bugs in
ZFS on linux, although I don't remember the details anymore).

Not saying that that is the most likely explanation, just something worth
checking.

> Kamigishi Rei has been an exemplary example of how to report a bug to
> an open source community. I want to thank him again. Thanks!

+1

> A second similar complaint from Herman Verschooten on Slack didn't
> mention ZFS at all. A third similar-seeming report on Slack was from
> somebody named Brandon Ros, who used Ubuntu (I believe 20.04, like
> Herman Verschooten). Also no indication that ZFS was used.
>
> I find it slightly hard to believe that it's ZFS, simply because all 3
> complaints involve Postgres 14. And have a lot of common factors. For
> example, Herman also used foreign keys -- a lot of users never bother
> with them. And like Kamigishi Rei, Herman found that a REINDEX (or was
> it VACUUM FULL?) seemingly made the problem go away.

Didn't 14 change the logic when index vacuums are done? That could cause
previously existing issues to manifest with a higher likelihood.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-10-28 22:52:55 Re: BUG #17241: llvm::install_bad_alloc_error_handler error
Previous Message Peter Geoghegan 2021-10-28 22:39:02 Re: BUG #17245: Index corruption involving deduplicated entries