From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Pawel Kudzia <kudzia(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date: | 2021-07-17 23:25:54 |
Message-ID: | CAH2-WzkvYQDXBone0vvYH8BJY-EEyRenMvztaDv0ZdoZHp7acw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sat, Jul 17, 2021 at 2:27 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Right. Looking at the code, I don't think that can happen. The code that
> deals with the pending list uses lock coupling, so it should never
> follow a pointer to a page that was concurrently deleted. Might
> something funny happen if a pending list page is deleted and immediately
> reused as another kind of page? I don't see a problem with that.
I don't immediately see a problem either, but that hardly means much
-- I can't honestly claim that I understand the code here.
Why does the code in ginInsertCleanup() use lock coupling? It's
probably true that if you don't couple pending list pages then the
tests would break in an obvious way, but that tells us nothing about
the design itself. It was probably self-evident what was going on in
earlier versions of the code, back when it could only run in VACUUM
anyway, when it didn't eagerly recycle pending list pages, etc. But
it's far from obvious now that we have all that stuff too.
Let's say processPendingPage() finds a page that is fundamentally of
the wrong kind (i.e. not a pending page). Why shouldn't the result be
something scary (some kind of data corruption), that doesn't come with
an error or hard crash? I can see no reason. Small differences in the
format of the tuples seem unlikely to be caught anywhere, at least
offhand.
Another possible consequence of the lack of sanity checking is that
whole *groups* of sibling pages (say from a posting tree or the entry
tree) that look like they're a group from the pending list (but are
actually from some tree) could get corrupted, one by one. This could
happen as a secondary effect of the initial corruption, which might
have only affected one page at first. Why shouldn't ginInsertCleanup()
do that? Almost all GIN pages use "GinPageGetOpaque(page)->rightlink"
in one way or another. If this is what happened then there really is
no telling what you'll end up with. You may have a very hard time
zeroing in on the truly relevant corruption.
> And a
> gin btree page won't be reused prematurely as a pending list page,
> because ginNewBuffer() checks GinPageIsRecyclable(), even if the page is
> being reused as a pending list page.
It doesn't matter that ginNewBuffer() checks GinPageIsRecyclable(),
since the page header's pd_prune_xid field will be 0 for a pending
list page (same as other index AMs, though not the same as other GIN
code). Note that GinPageIsRecyclable() returns "true" (safe to
recycle) when it sees any "pd_prune_xid = 0" page, by design. Also, if
ginInsertCleanup() ever tries to process/delete a posting tree or
entry tree page, then it's still not going to set pd_prune_xid in the
first place (because it believes that they're pending list pages).
> While playing with the trace Pawel sent, I loaded some of the index
> pages into a local instance so that I could step through the code with
> gdb. I filled the rest of the pages with zeros. I managed to get the gin
> btree descend code into an infinite loop with that. When it followed a
> pointer to an entry leaf page, but that entry leaf page was substituted
> with zeros, it interpreted the all-zeros page as a page with a
> right-link to block 0. It then stepped right to read block 0 - which is
> the metapage! - and incorrectly interpreted it as an internal page.
> Amazingly, it managed to perform the binary search on it as if it
> contained index tuples, and came up wth an another block number, which I
> had also replaced with all-zeros.
That doesn't actually surprise me. I have deliberately corrupted a lot
of test data in all kinds of ways. I have some idea how broken things
can be internally without breaking in a very obvious way.
> Long story short, we should sprinkle a lot more sanity checks to the
> functions in ginbtree.c. It should check that the page is of expected
> kind; no stepping from internal entry page to posting tree page. And
> seeing a pointer to block 0 at any step should be an error.
Definitely.
> Yeah, I wouldn't be surprised if this turns out to be a bug in pending
> list cleanup. But I haven't been able to come up with a concrete
> sequence of steps that would cause it.
I also wouldn't be surprised if it was some unrelated issue, despite
everything. As broken as I think the code in ginInsertCleanup() is, it
has managed to be broken without anybody noticing for quite a while on
a few occasions already. So unfortunately all of my observations about
ginInsertCleanup() may in the end not help you to find the true
underlying bug.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-07-18 01:24:42 | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Previous Message | Heikki Linnakangas | 2021-07-17 21:27:43 | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |