Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

From: Pawel Kudzia <kudzia(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date: 2021-08-02 08:07:24
Message-ID: CAJYBUS8gnevDjh3WRB65M4SfTMJZVAdc+qTs+zHiPfdWcCJaTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jun 17, 2021 at 4:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Pawel Kudzia <kudzia(at)gmail(dot)com> writes:
> > with help from IRC we've found that decreasing work_mem from 1MB to 256kB
> > or less makes the problem go away:
>
> Hmm. So that suggests that the index itself is *not* corrupt,
> but the problem is associated with a bug in the indexscan
> algorithms.
>
> Have you experimented with different index opclasses? Your
> original report said you were using gin__int_ops, but that's
> not the default, and we sort of deprecate intarray these days.

Hello,

Three weeks ago we've changed from

CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids public.gin__int_ops);

into

CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids);

Sadly - the problems are back.

I have:
* queries that return incorrect responses:
SET enable_seqscan = OFF;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1485}' )
AND NOT ( (attribute_name_ids||0) && '{1485}') LIMIT 10;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{144}' )
AND NOT ( (attribute_name_ids||0) && '{144}') LIMIT 10;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{2652}' )
AND NOT ( (attribute_name_ids||0) && '{2652}') LIMIT 10;

all three return:
entity_id | primname
-----------+--------------------
99311962 | msci_674591ltepsgt
99312880 | msci_674667mgu

data=# explain analyze SELECT entity_id FROM entity WHERE (
attribute_name_ids && '{2652}' ) AND NOT ( (attribute_name_ids||0) &&
'{2652}') LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=30.59..31.71 rows=10 width=8) (actual
time=149.988..200.749 rows=2 loops=1)
-> Bitmap Heap Scan on entity (cost=30.59..3798.81 rows=33644
width=8) (actual time=149.985..200.744 rows=2 loops=1)
Recheck Cond: (attribute_name_ids && '{2652}'::integer[])
Rows Removed by Index Recheck: 79741
Filter: (NOT ((attribute_name_ids || 0) && '{2652}'::integer[]))
Rows Removed by Filter: 16723
Heap Blocks: exact=8277 lossy=9989
-> Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..30.59 rows=33813 width=0) (actual time=9.712..9.713
rows=37367 loops=1)
Index Cond: (attribute_name_ids && '{2652}'::integer[])
Planning Time: 0.472 ms
Execution Time: 200.977 ms
(11 rows)

* first time when hourly consistency check reported the inconsistency
[ 2021-08-01 19:04:15 for entity_id=99312880, 2021-08-01 23:04:15 for
entity_id=99311962 ]
* timestamps when affected row has been updated [ entity_id=99312880:
2021-08-01 16:51:59, 2021-08-01 19:04:06 [ most likely that update led
to inconsistency ], 2021-08-01 23:04:14, 2021-08-02 01:08:18,
2021-08-02 05:12:15, entity_id=99311962 2021-08-01 16:51:59,
2021-08-01 19:04:05, 2021-08-01 23:04:13 [ likely culprit ],
2021-08-02 01:08:16 ]
* wal files from the **master server** from the time when the
inconsistency started to occur [ from 2021-08-01 14:50:37 till
2021-08-02 04:31:00 ; sadly i don't have enough to cover the whole
period between snapshots mentioned below ]
* file-level snapshot of the database taken from streaming
**replication slave** before the problem started to occur [ 2021-08-01
12:00:01 ] and after it occured [ 2021-08-02 00:00:01 ]

I'll be happy to run any diagnostics needed, provide access to
particular wal file.

Greetings!

--
regards,
Pawel Kudzia

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2021-08-02 09:32:16 Re: psql doesn't show tables duplicated in multiple schemas
Previous Message Wojciech Strzalka 2021-08-02 07:05:19 psql doesn't show tables duplicated in multiple schemas