Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Geoghegan <pg(at)bowt(dot)ie>, Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>, lubennikovaav(at)gmail(dot)com
Subject: Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date: 2024-05-13 12:20:25
Message-ID: CALT9ZEH4AVjhruJzbWynwWLHHLZQQMpCg4BTKi9sGvC+3AjKag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A correction of a typo in previous message:
non-leaf pages iteration cycles (under P_ISLEAF(topaque)) -> non-leaf pages
iteration cycles (under !P_ISLEAF(topaque))

On Mon, 13 May 2024 at 16:19, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> wrote:

>
>
> On Mon, 13 May 2024 at 15:55, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
> wrote:
>
>> Hi, Alexander!
>>
>> On Mon, 13 May 2024 at 05:42, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
>> wrote:
>>
>>> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>>> <aekorotkov(at)gmail(dot)com> wrote:
>>> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
>>> > <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>>> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
>>> aekorotkov(at)gmail(dot)com> wrote:
>>> > > > The only bt_target_page_check() caller is
>>> > > > bt_check_level_from_leftmost(), which overrides state->target in
>>> the
>>> > > > next iteration anyway. I think the patch is just refactoring to
>>> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
>>> > >
>>> > > I find your argument unconvincing.
>>> > >
>>> > > After bt_target_page_check() returns at line 919, and before
>>> bt_check_level_from_leftmost() overrides state->target in the next
>>> iteration, bt_check_level_from_leftmost() conditionally fetches an item
>>> from the page referenced by state->target. See line 963.
>>> > >
>>> > > I'm left with four possibilities:
>>> > >
>>> > >
>>> > > 1) bt_target_page_check() never gets to the code that uses
>>> "rightpage" rather than "state->target" in the same iteration where
>>> bt_check_level_from_leftmost() conditionally fetches an item from
>>> state->target, so the change you're making doesn't matter.
>>> > >
>>> > > 2) The code prior to v2-0003 was wrong, having changed
>>> state->target in an inappropriate way, causing the wrong thing to happen at
>>> what is now line 963. The patch fixes the bug, because state->target no
>>> longer gets overwritten where you are now using "rightpage" for the value.
>>> > >
>>> > > 3) The code used to work, having set up state->target correctly in
>>> the place where you are now using "rightpage", but v2-0003 has broken that.
>>> > >
>>> > > 4) It's been broken all along and your patch just changes from
>>> wrong to wrong.
>>> > >
>>> > >
>>> > > If you believe (1) is true, then I'm complaining that you are
>>> relying far to much on action at a distance, and that you are not
>>> documenting it. Even with documentation of this interrelationship, I'd be
>>> unhappy with how brittle the code is. I cannot easily discern that the two
>>> don't ever happen in the same iteration, and I'm not at all convinced one
>>> way or the other. I tried to set up some Asserts about that, but none of
>>> the test cases actually reach the new code, so adding Asserts doesn't help
>>> to investigate the question.
>>> > >
>>> > > If (2) is true, then I'm complaining that the commit message doesn't
>>> mention the fact that this is a bug fix. Bug fixes should be clearly
>>> documented as such, otherwise future work might assume the commit can be
>>> reverted with only stylistic consequences.
>>> > >
>>> > > If (3) is true, then I'm complaining that the patch is flat busted.
>>> > >
>>> > > If (4) is true, then maybe we should revert the entire feature, or
>>> have a discussion of mitigation efforts that are needed.
>>> > >
>>> > > Regardless of which of 1..4 you pick, I think it could all do with
>>> more regression test coverage.
>>> > >
>>> > >
>>> > > For reference, I said something similar earlier today in another
>>> email to this thread:
>>> > >
>>> > > This patch introduces a change that stores a new page into variable
>>> "rightpage" rather than overwriting "state->target", which the old
>>> implementation most certainly did. That means that after returning from
>>> bt_target_page_check() into the calling function
>>> bt_check_level_from_leftmost() the value in state->target is not what it
>>> would have been prior to this patch. Now, that'd be irrelevant if nobody
>>> goes on to consult that value, but just 44 lines further down in
>>> bt_check_level_from_leftmost() state->target is clearly used. So the
>>> behavior at that point is changing between the old and new versions of the
>>> code, and I think I'm within reason to ask if it was wrong before the
>>> patch, wrong after the patch, or something else? Is this a bug being
>>> introduced, being fixed, or ... ?
>>> >
>>> > Thank you for your analysis. I'm inclined to believe in 2, but not
>>> > yet completely sure. It's really pity that our tests don't cover
>>> > this. I'm investigating this area.
>>>
>>> It seems that I got to the bottom of this. Changing
>>> BtreeCheckState.target for a cross-page unique constraint check is
>>> wrong, but that happens only for leaf pages. After that
>>> BtreeCheckState.target is only used for setting the low key. The low
>>> key is only used for non-leaf pages. So, that didn't lead to any
>>> visible bug. I've revised the commit message to reflect this.
>>>
>>
>> I agree with your analysis regarding state->target:
>> - when the unique check is on, state->target was reassigned only for the
>> leaf pages (under P_ISLEAF(topaque) in bt_target_page_check).
>> - in this level (leaf) in bt_check_level_from_leftmost() this value of
>> state->target was used to get state->lowkey. Then it was reset (in the next
>> iteration of do loop in in bt_check_level_from_leftmost()
>> - state->lowkey lives until the end of pages level (leaf) iteration
>> cycle. Then, low-key is reset (state->lowkey = NULL in the end of
>> bt_check_level_from_leftmost())
>> - state->lowkey is used only in bt_child_check/bt_child_highkey_check.
>> Both are called only from non-leaf pages iteration cycles (under
>> P_ISLEAF(topaque))
>> - Also there is a check (rightblock_number != P_NONE) in before getting
>> rightpage into state->target in bt_target_page_check() that ensures us that
>> rightpage indeed exists and getting this (unused) lowkey in
>> bt_check_level_from_leftmost will not invoke any page reading errors.
>>
>> I'm pretty sure that there was no bug in this, not just the bug was
>> hidden.
>>
>> Indeed re-assigning state->target in leaf page iteration for cross-page
>> unique check was not beautiful, and Peter pointed out this. In my opinion
>> the patch 0003 is a pure code refactoring.
>>
>> As for the cross-page check regression/TAP testing, this test had
>> problems since the btree page layout is not fixed (especially it's
>> different on 32-bit arch). I had a variant for testing cross-page check
>> when the test was yet regression one upthread for both 32/64 bit
>> architectures. I remember it was decided not to include it due to
>> complications and low impact for testing the corner case of very rare
>> cross-page duplicates. (There were also suggestions to drop cross-page
>> duplicates check at all, which I didn't agree 2 years ago, but still it can
>> make sense)
>>
>> Separately, I propose to avoid getting state->lowkey for leaf pages at
>> all as it's unused. PFA is a simple patch for this. (I don't add it to the
>> current patch set as I believe it has nothing to do with UNIQUE constraint
>> check, rather it improves the previous btree amcheck code)
>>
>
> A correction of a typo in previous message:
> non-leaf pages iteration cycles (under !P_ISLEAF(topaque)) -> non-leaf
> pages iteration cycles (under !P_ISLEAF(topaque))
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-05-13 12:25:26 RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Previous Message Pavel Borisov 2024-05-13 12:19:54 Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.