Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pawel Kudzia <kudzia(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date: 2021-06-17 21:55:56
Message-ID: CAPpHfdt5seb3UC7M8nGpNURSV=LqqBWaSs3To3YV_pxUXjA8Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
> > Pawel Kudzia <kudzia(at)gmail(dot)com> writes:
> >> with help from IRC we've found that decreasing work_mem from 1MB to 256kB
> >> or less makes the problem go away:
>
> > Hmm. So that suggests that the index itself is *not* corrupt,
> > but the problem is associated with a bug in the indexscan
> > algorithms.
>
> After staring at the code I think there is at least one bug, and
> possibly two, in shimTriConsistentFn. That's likely to be implicated
> here because intarray's GIN opclass only provides a bool consistent
> function. I'm not very clear on the circumstances that lead to the scan
> code inventing GIN_MAYBE inputs, so I haven't been able to construct a
> test case.
>
> The definite bug is triggered because intarray relies on the API
> specification that says that if the search mode is
> GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
> when there's at least one TRUE key:
>
> case RTOverlapStrategyNumber:
> /* result is not lossy */
> *recheck = false;
> /* at least one element in check[] is true, so result = true */
> res = true;
> break;
>
> shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
> for which it'll incorrectly get a TRUE result, as it will also for
> all the following checks. The upshot is that shimTriConsistentFn
> will convert any case with a MAYBE input to a hard TRUE result,
> with no recheck requirement. This'd easily explain the reported
> misbehavior. So in the spot where we do this:
>
> /* First call consistent function with all the maybe-inputs set FALSE */
> for (i = 0; i < nmaybe; i++)
> key->entryRes[maybeEntries[i]] = GIN_FALSE;
> curResult = directBoolConsistentFn(key);
>
> we need to add some code that checks for default searchMode, and in
> that case doesn't call the consistentFn unless at least one
> (non-MAYBE) input is TRUE.

+1,
I think in default search mode we can just start with curResult equal
to GIN_FALSE instead of calling directBoolConsistentFn().

> The other thing that I'm unsure about, because the code is horribly
> underdocumented, is that it's not very clear whether
> shimTriConsistentFn is accurately converting between the bool and
> the tristate APIs. That's because it's not very clear what the
> tristate API actually *is*. In particular, is the end state of
> key->recheckCurItem meaningful in the tristate case? If it's not,
> then the short-circuit case for no MAYBE inputs is broken, because
> it'll return TRUE when the bool consistentFn is trying to tell us
> to recheck. But if it is meaningful, then the multiway case is broken,
> because it will return the recheckCurItem result set by the last call to
> the bool consistentfn; which might be false even though other calls said
> true. (So in that case I think we'd need "key->recheckCurItem = recheck"
> at the end.) I also wonder how any of that works correctly for real
> triconsistent functions, which don't have access to the recheckCurItem
> flag.

As far as I recall, returning MAYBE for no MAYBE inputs in the
triConsistent function is equivalent to setting the recheck flag in
the bool consistent function. Thus, short-circuit case for no MAYBE
inputs should be broken.

> Anyway, I'm punting this to the code authors. I'd like to see
> some comments clarifying what the API really is, not just a
> quick-n-dirty code fix.

Yep, I'm going to have a closer look at this tomorrow.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-06-17 22:04:08 Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Previous Message Tom Lane 2021-06-17 21:51:18 Re: BUG #17062: Assert failed in RemoveRoleFromObjectPolicy() on DROP OWNED policy applied to duplicate role