From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Pawel Kudzia <kudzia(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date: | 2021-07-17 21:27:43 |
Message-ID: | 10e8dbad-d21a-b22b-6677-c5c9df9ea6ee@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 17/07/2021 04:56, Peter Geoghegan wrote:
> On Fri, Jul 16, 2021 at 5:30 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Hmm, seems we should fix that. But could a prematurely recycled deleted
>> page cause permanent corruption?
>
> If scans can find a page that is wholly unrelated to the expected page
> (and possibly even in the wrong high level page category), then it's
> really hard to predict what might happen. This could lead to real
> chaos.
Right. Looking at the code, I don't think that can happen. The code that
deals with the pending list uses lock coupling, so it should never
follow a pointer to a page that was concurrently deleted. Might
something funny happen if a pending list page is deleted and immediately
reused as another kind of page? I don't see a problem with that. And a
gin btree page won't be reused prematurely as a pending list page,
because ginNewBuffer() checks GinPageIsRecyclable(), even if the page is
being reused as a pending list page.
> ginInsertCleanup() makes no attempt to perform basic validation
> of its assumptions about what kind of page this is, except for some
> assertions. We should have something like a "can't happen" error on
> !GinPageIsList() inside ginInsertCleanup() -- if we had that already
> then I might be able to reason about this problem. It wouldn't hurt to
> have similar checks in other code that deals with posting trees and
> entry trees, too.
+1
While playing with the trace Pawel sent, I loaded some of the index
pages into a local instance so that I could step through the code with
gdb. I filled the rest of the pages with zeros. I managed to get the gin
btree descend code into an infinite loop with that. When it followed a
pointer to an entry leaf page, but that entry leaf page was substituted
with zeros, it interpreted the all-zeros page as a page with a
right-link to block 0. It then stepped right to read block 0 - which is
the metapage! - and incorrectly interpreted it as an internal page.
Amazingly, it managed to perform the binary search on it as if it
contained index tuples, and came up wth an another block number, which I
had also replaced with all-zeros.
Long story short, we should sprinkle a lot more sanity checks to the
functions in ginbtree.c. It should check that the page is of expected
kind; no stepping from internal entry page to posting tree page. And
seeing a pointer to block 0 at any step should be an error.
But this was just an artifact of the missing pages my test. It doesn't
explain the original problem.
> ginInsertCleanup() is tolerant of all kinds of things. It's not just
> the lack of page-level sanity checks. It's also the basic approach to
> crash safety, which relies on the fact that GIN only does lossy index
> scans. My guess is that there could be lots of problems without it
> being obvious to users. Things really went downhill in
> ginInsertCleanup() starting in commit e956808328.
Yeah, I wouldn't be surprised if this turns out to be a bug in pending
list cleanup. But I haven't been able to come up with a concrete
sequence of steps that would cause it.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-07-17 23:25:54 | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Previous Message | Heikki Linnakangas | 2021-07-17 20:51:38 | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |