From: | Pawel Kudzia <kudzia(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date: | 2021-07-25 20:27:20 |
Message-ID: | CAJYBUS_dTQMK_3+b57GVftH9r0m9t2sURNa26uX1M9YAa7k_5w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sun, Jul 25, 2021 at 9:08 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 23/07/2021 18:04, Pawel Kudzia wrote:
> > Thanks a lot for your patience and multiple patches that you've
> > provided. Please pardon my ignorance - I don't have low-level
> > understanding of how the query is being executed - but are you sure
> > that index is missing entries and not the other way around - that it
> > has too many entries?
>
> To be precise, the index has an extra entry for row (4002784,1) with key
> 1373, and it's missing the entry with key 38048120. And for row
> (4002869,14), it has an extra row for key 1373, and it's missing the
> entry for key 95333744.
>
> > To recap - SELECT, answered based on the GIN, reports rows that
> > actually do not match the criteria provided in WHERE. Just lowering
> > work_mem makes the problem go away, whith GIN still being used.
>
> The reason that lowering work_mem hides the problem is that GIN collects
> all the matches in a so called TID bitmap, and if the bitmap grows too
> large compared to work_mem, it becomes lossy to save memory. When it's
> lossy, it only stores the heap block numbers of the matches. For a
> regular, non-lossy, match, the Bitmap Heap Scan just returns the row
> that the index says is a match. For the lossy matches, the Bitmap Heap
> Scan node needs to check every row on the page to see which ones
> actually match. This re-checking hides the problem that some of the
> matches that the index reported were not real.
Thank you for the explanation!
>
> Note that you are also getting incorrect results with missing row for
> other queries. You can try it with e.g.:
>
> -- Using the index
> set enable_seqscan=off;
> set enable_bitmapscan=on;
> SELECT count(*) FROM entity WHERE attribute_name_ids && '{95333744}';
> SELECT count(*) FROM entity WHERE attribute_name_ids && '{38048120}';
>
> -- Without index
> set enable_seqscan=on;
> set enable_bitmapscan=off;
> SELECT count(*) FROM entity WHERE attribute_name_ids && '{95333744}';
> SELECT count(*) FROM entity WHERE attribute_name_ids && '{38048120}';
>
>
Actually - both give identical results - count(*) = 0.
I think you actually wanted me to run those:
data=# set enable_seqscan=off;
SET
data=# set enable_bitmapscan=on;
SET
data=# SELECT count(*) FROM entity WHERE attribute_name_ids && '{1737}';
count
-------
79565
(1 row)
data=# set enable_seqscan=on;
SET
data=# set enable_bitmapscan=off;
SET
data=# SELECT count(*) FROM entity WHERE attribute_name_ids && '{1737}';
count
-------
79560
(1 row)
Results indeed differ.
> I'll work on a patch to add more sanity checks to the GIN code when it
> traverses the tree, to catch the case that it accidentally steps on a
> wrong kind of a page (I'm pretty busy next week, so might not get to
> that until the week after though). I don't think that will help here,
> but who knows, and at least we can rule out some kinds of bugs.
> Alexander, can you finish the fixes to the ternary logic? That doesn't
> explain this corruption either, but we should fix it anyway.
>
Thanks a lot! I'm happy to test.
Greetings!
--
regards,
Pawel Kudzia
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-07-25 20:27:42 | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Previous Message | Pawel Kudzia | 2021-07-25 20:14:23 | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |