Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pawel Kudzia <kudzia(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date: 2021-06-17 19:49:47
Message-ID: 1039976.1623959387@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> Pawel Kudzia <kudzia(at)gmail(dot)com> writes:
>> with help from IRC we've found that decreasing work_mem from 1MB to 256kB
>> or less makes the problem go away:

> Hmm. So that suggests that the index itself is *not* corrupt,
> but the problem is associated with a bug in the indexscan
> algorithms.

After staring at the code I think there is at least one bug, and
possibly two, in shimTriConsistentFn. That's likely to be implicated
here because intarray's GIN opclass only provides a bool consistent
function. I'm not very clear on the circumstances that lead to the scan
code inventing GIN_MAYBE inputs, so I haven't been able to construct a
test case.

The definite bug is triggered because intarray relies on the API
specification that says that if the search mode is
GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
when there's at least one TRUE key:

case RTOverlapStrategyNumber:
/* result is not lossy */
*recheck = false;
/* at least one element in check[] is true, so result = true */
res = true;
break;

shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
for which it'll incorrectly get a TRUE result, as it will also for
all the following checks. The upshot is that shimTriConsistentFn
will convert any case with a MAYBE input to a hard TRUE result,
with no recheck requirement. This'd easily explain the reported
misbehavior. So in the spot where we do this:

/* First call consistent function with all the maybe-inputs set FALSE */
for (i = 0; i < nmaybe; i++)
key->entryRes[maybeEntries[i]] = GIN_FALSE;
curResult = directBoolConsistentFn(key);

we need to add some code that checks for default searchMode, and in
that case doesn't call the consistentFn unless at least one
(non-MAYBE) input is TRUE.

The other thing that I'm unsure about, because the code is horribly
underdocumented, is that it's not very clear whether
shimTriConsistentFn is accurately converting between the bool and
the tristate APIs. That's because it's not very clear what the
tristate API actually *is*. In particular, is the end state of
key->recheckCurItem meaningful in the tristate case? If it's not,
then the short-circuit case for no MAYBE inputs is broken, because
it'll return TRUE when the bool consistentFn is trying to tell us
to recheck. But if it is meaningful, then the multiway case is broken,
because it will return the recheckCurItem result set by the last call to
the bool consistentfn; which might be false even though other calls said
true. (So in that case I think we'd need "key->recheckCurItem = recheck"
at the end.) I also wonder how any of that works correctly for real
triconsistent functions, which don't have access to the recheckCurItem
flag.

Anyway, I'm punting this to the code authors. I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-06-17 20:33:46 BUG #17062: Assert failed in RemoveRoleFromObjectPolicy() on DROP OWNED policy applied to duplicate role
Previous Message Tom Lane 2021-06-17 14:50:14 Re: BUG #17061: Impossible to query the fields of the tuple created by SEARCH BREADTH FIRST BY .. SET ..