Re: Amcheck verification of GiST and GIN

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Jose Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Amcheck verification of GiST and GIN
Date: 2024-07-14 19:00:19
Message-ID: 9B9E93F5-1831-4844-96EA-9B3C4AD31AF4@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas!

Thank you so much for your interest in the patchset.

> On 10 Jul 2024, at 19:01, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> I realized amcheck GIN/GiST support would be useful for testing my
> patches adding parallel builds for these index types, so I decided to
> take a look at this and do an initial review today.

Great! Thank you!

> Attached is a patch series with a extra commits to keep the review
> comments and patches adjusting the formatting by pgindent (the patch
> seems far enough for this).

I was hoping to address your review comments this weekend, but unfortunately I could not. I'll do this ASAP, but at least I decided to post answers on questions.

>
> Let me quickly go through the review comments:
>
> 1) Not sure I like 'amcheck.c' very much, I'd probably go with something
> like 'verify_common.c' to match naming of the other files. But it's just
> nitpicking and I can live with it.

Any name works for me. We have tens of files ending with "common.c", so I think that's a good way to go.

> 2) amcheck_lock_relation_and_check seems to be the most important
> function, yet there's no comment explaining what it does :-(

Makes sense.

> 3) amcheck_lock_relation_and_check still has a TODO to add the correct
> name of the AM

Yes, I've discovered it during rebase and added TODO.

> 4) Do we actually need amcheck_index_mainfork_expected as a separate
> function, or could it be a part of index_checkable?

It was separate function before refactoring...

> 5) The comment for heaptuplespresent says "debug counter" but that does
> not really explain what it's for. (I see verify_nbtree has the same
> comment, but maybe let's improve that.)

It's there for a DEBUG1 message
ereport(DEBUG1,
(errmsg_internal("finished verifying presence of " INT64_FORMAT " tuples from table \"%s\" with bitset %.2f%% set",
But the message is gone for GiST. Perhaps, let's restore this message?

>
> 6) I'd suggest moving the GISTSTATE + blocknum fields to the beginning
> of GistCheckState, it seems more natural to start with "generic" fields.

Makes sense.

> 7) I'd adjust the gist_check_parent_keys_consistency comment a bit, to
> explain what the function does first, and only then explain how.

Makes sense.

> 8) We seem to be copying PageGetItemIdCareful() around, right? And the
> copy in _gist.c still references nbtree - I guess that's not right.

Version differ in two aspects:
1. Size of opaque data may be different. But we can pass it as a parameter.
2. GIN's line pointer verification is slightly more strict.

>
> 9) Why is the GIN function called gin_index_parent_check() and not
> simply gin_index_check() as for the other AMs?

AFAIR function should be called _parent_ if it takes ShareLock. gin_index_parent_check() does not, so I think we should rename it.

> 10) The debug in gin_check_posting_tree_parent_keys_consistency triggers
> assert when running with client_min_messages='debug5', it seems to be
> accessing bogus item pointers.
>
> 11) Why does it add pg_amcheck support only for GiST and not GIN?

GiST part is by far more polished. When we were discussing current implementation with Peter G, we decided that we could finish work on GiST, and then proceed to GIN. Main concern is about GIN's locking model.

> On 12 Jul 2024, at 15:16, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 7/10/24 18:01, Tomas Vondra wrote:
>> ...
>>
>> That's all for now. I'll add this to the stress-testing tests of my
>> index build patches, and if that triggers more issues I'll report those.
>>
>
> As mentioned a couple days ago, I started using this patch to validate
> the patches adding parallel builds to GIN and GiST indexes - I scripts
> to stress-test the builds, and I added the new amcheck functions as
> another validation step.
>
> For GIN indexes it didn't find anything new (in either this or my
> patches), aside from the assert crash I already reported.
>
> But for GiST it turned out to be very valuable - it did actually find an
> issue in my patches, or rather confirm my hypothesis that the way the
> patch generates fake LSN may not be quite right.
>
> In particular, I've observed these two issues:
>
> ERROR: heap tuple (13315,38) from table "planet_osm_roads" lacks
> matching index tuple within index "roads_7_1_idx"
>
> ERROR: index "roads_7_7_idx" has inconsistent records on page 23723
> offset 113
>
> And those consistency issues are real - I've managed to reproduce issues
> with incorrect query results (by comparing the results to an index built
> without parallelism).
>
> So that's nice - it shows the value of this patch, and I like it.

That's great!

> One thing I've been wondering about is that currently amcheck (in
> general, not just these new GIN/GiST functions) errors out on the first
> issue, because it does ereport(ERROR). Which is good enough to decide if
> there is some corruption, but a bit inconvenient if you need to assess
> how much corruption is there. For example when investigating the issue
> in my patch it would have been great to know if there's just one broken
> page, or if there are dozens/hundreds/thousands of them.
>
> I'd imagine we could have a flag which says whether to fail on the first
> issue, or keep looking at future pages. Essentially, whether to do
> ereport(ERROR) or ereport(WARNING). But maybe that's a dead-end, and
> once we find the first issue it's futile to inspect the rest of the
> index, because it can be garbage. Not sure. In any case, it's not up to
> this patch to invent that.

The thing is amcheck tries hard to to do a core dump. It's still possible to crash it with garbage. But if we continue check after encountering first corruption - increase in SegFaults is inevitable.

Thank you! I hope I can get back to code ASAP.

Best regards, Andrey Borodin.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-07-15 00:45:02 Re: CREATE INDEX CONCURRENTLY on partitioned index
Previous Message Michail Nikolaev 2024-07-14 18:01:50 [BUG?] check_exclusion_or_unique_constraint false negative