From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: amcheck verification for GiST |
Date: | 2019-03-27 23:57:17 |
Message-ID: | CAH2-WzkswQoYNTgAN0A+WV+=KmEPtuNUv-drEZ+_Q__zjQX4KA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 27, 2019 at 10:29 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Thanks! I had a look, and refactored it quite a bit.
I'm really happy that other people seem to be driving amcheck in a new
direction, without any prompting from me. It's too important to remain
something that only I have contributed to.
> I found the way the recursion worked confusing. On each internal page,
> it looped through all the child nodes, checking the consistency of the
> downlinks. And then it looped through the children again, to recurse.
> This isn't performance-critical, but visiting every page twice still
> seems strange.
To be fair, that's actually what bt_index_parent_check() does. OTOH,
it has to work in a way that makes it an extension of
bt_index_check(), which is not a problem for
gist_index_parent_check().
> In gist_check_page_keys(), if we get into the code to deal with a
> concurrent update, we set 'parent' to point to a tuple on a parent page,
> then unlock it, and continue to look at remaining tuples, using the
> pointer that points to an unlocked buffer.
Why not just copy the page into a local buffer? See my remarks on this below.
> I came up with the attached, which fixes the above-mentioned things. I
> also replaced the check that each node has only internal or leaf
> children, with a different check that the tree has the same height in
> all branches. That catches more potential problems, and was easier to
> implement after the refactoring. This still needs at least a round of
> fixing typos and tidying up comments, but it's more straightforward now,
> IMHO.
You probably didn't mean to leave this "boom" error behind:
> + if (GistPageIsDeleted(page))
> + {
> + elog(ERROR,"boom");
I see that you have this check for deleted pages:
> + if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
> + ereport(ERROR,
> + (errcode(ERRCODE_INDEX_CORRUPTED),
> + errmsg("index \"%s\" has deleted page with tuples",
> + RelationGetRelationName(rel))));
> + }
Why not have a similar check for non-deleted pages, whose maxoffset
must be <= MaxIndexTuplesPerPage?
I see various errors like this one:
> + if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
> + ereport(ERROR,
> + (errcode(ERRCODE_INDEX_CORRUPTED),
> + errmsg("index \"%s\" has inconsistent tuple sizes",
> + RelationGetRelationName(rel))));
Can't we say which TID is involved here, so we can find the offending
corrupt tuple afterwards? Or at least the block number? And maybe even
the LSN of the page? I think that that kind of stuff could be added in
a number of places.
I see this stuff that's related to concurrent processes:
> + /*
> + * It's possible that the page was split since we looked at the parent,
> + * so that we didn't missed the downlink of the right sibling when we
> + * scanned the parent. If so, add the right sibling to the stack now.
> + */
> + /*
> + * There was a discrepancy between parent and child tuples.
> + * We need to verify it is not a result of concurrent call
> + * of gistplacetopage(). So, lock parent and try to find downlink
> + * for current page. It may be missing due to concurrent page
> + * split, this is OK.
> + */
Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
> + stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);
If the gistplacetopage() stuff is truly necessary, then is it okay to
call gist_refind_parent() with the original buffer lock still held
like this?
I still suspect that we should have something like palloc_btree_page()
for GiST, so that we're always operating on a copy of the page in
palloc()'d memory. Maybe it's worthwhile to do something clever with
concurrently holding buffer locks, but if that's what we're doing here
then I would also expect us to have something weaker than ShareLock as
our relation-level heavyweight lock. And, there should be a prominent
explanation of the theory behind it somewhere.
> What have you been using to test this?
pg_hexedit has full support for GiST. ;-)
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Nagaura, Ryohei | 2019-03-28 01:11:23 | RE: Timeout parameters |
Previous Message | David Rowley | 2019-03-27 23:49:02 | Re: Should the docs have a warning about pg_stat_reset()? |