From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, David Rowley <dgrowley(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17245: Index corruption involving deduplicated entries |
Date: | 2021-10-28 22:48:31 |
Message-ID: | 20211028224831.bj7ew3j74tw4cmvh@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2021-10-28 15:23:38 -0700, Peter Geoghegan wrote:
> Anything is possible. But Kamigishi Rei has said that this database
> has never had a hard crash or unclean shut down, which I definitely
> believe. Also, they are using ECC on a Xeon processor. This is the
> kind of hardware that is generally assumed to be very reliable.
That wouldn't protect against e.g. a logic bug in ZFS. Given its copy-on-write
nature corruption could very well manifest as seeing an older version of the
data when re-reading data from disk. Which could very well lead to the type of
corruption we're seeing here.
A few years back I tried to help somebody investigate corruption that turned
out to be caused by something roughly along those lines (IIRC several bugs in
ZFS on linux, although I don't remember the details anymore).
Not saying that that is the most likely explanation, just something worth
checking.
> Kamigishi Rei has been an exemplary example of how to report a bug to
> an open source community. I want to thank him again. Thanks!
+1
> A second similar complaint from Herman Verschooten on Slack didn't
> mention ZFS at all. A third similar-seeming report on Slack was from
> somebody named Brandon Ros, who used Ubuntu (I believe 20.04, like
> Herman Verschooten). Also no indication that ZFS was used.
>
> I find it slightly hard to believe that it's ZFS, simply because all 3
> complaints involve Postgres 14. And have a lot of common factors. For
> example, Herman also used foreign keys -- a lot of users never bother
> with them. And like Kamigishi Rei, Herman found that a REINDEX (or was
> it VACUUM FULL?) seemingly made the problem go away.
Didn't 14 change the logic when index vacuums are done? That could cause
previously existing issues to manifest with a higher likelihood.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2021-10-28 22:52:55 | Re: BUG #17241: llvm::install_bad_alloc_error_handler error |
Previous Message | Peter Geoghegan | 2021-10-28 22:39:02 | Re: BUG #17245: Index corruption involving deduplicated entries |