Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Pawel Kudzia <kudzia(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date: 2021-07-25 19:08:06
Message-ID: 293815bb-3304-a3a7-01a9-f3033ede0a74@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 23/07/2021 18:04, Pawel Kudzia wrote:
> Thanks a lot for your patience and multiple patches that you've
> provided. Please pardon my ignorance - I don't have low-level
> understanding of how the query is being executed - but are you sure
> that index is missing entries and not the other way around - that it
> has too many entries?

To be precise, the index has an extra entry for row (4002784,1) with key
1373, and it's missing the entry with key 38048120. And for row
(4002869,14), it has an extra row for key 1373, and it's missing the
entry for key 95333744.

> To recap - SELECT, answered based on the GIN, reports rows that
> actually do not match the criteria provided in WHERE. Just lowering
> work_mem makes the problem go away, whith GIN still being used.

The reason that lowering work_mem hides the problem is that GIN collects
all the matches in a so called TID bitmap, and if the bitmap grows too
large compared to work_mem, it becomes lossy to save memory. When it's
lossy, it only stores the heap block numbers of the matches. For a
regular, non-lossy, match, the Bitmap Heap Scan just returns the row
that the index says is a match. For the lossy matches, the Bitmap Heap
Scan node needs to check every row on the page to see which ones
actually match. This re-checking hides the problem that some of the
matches that the index reported were not real.

Note that you are also getting incorrect results with missing row for
other queries. You can try it with e.g.:

-- Using the index
set enable_seqscan=off;
set enable_bitmapscan=on;
SELECT count(*) FROM entity WHERE attribute_name_ids && '{95333744}';
SELECT count(*) FROM entity WHERE attribute_name_ids && '{38048120}';

-- Without index
set enable_seqscan=on;
set enable_bitmapscan=off;
SELECT count(*) FROM entity WHERE attribute_name_ids && '{95333744}';
SELECT count(*) FROM entity WHERE attribute_name_ids && '{38048120}';

I'll work on a patch to add more sanity checks to the GIN code when it
traverses the tree, to catch the case that it accidentally steps on a
wrong kind of a page (I'm pretty busy next week, so might not get to
that until the week after though). I don't think that will help here,
but who knows, and at least we can rule out some kinds of bugs.
Alexander, can you finish the fixes to the ternary logic? That doesn't
explain this corruption either, but we should fix it anyway.

- Heikki

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pawel Kudzia 2021-07-25 20:14:23 Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Previous Message Andrey Borodin 2021-07-25 15:54:29 Re: BUG #17122: panic on prepare with subsequent pg_advisory_lock() and pg_advisory_xact_lock_shared()