From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Pawel Kudzia <kudzia(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date: | 2021-07-15 18:49:31 |
Message-ID: | d4f4d5e2-35ef-58e4-4f31-5fdfb2c146a0@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 15/07/2021 12:38, Pawel Kudzia wrote:
>>> How can I identify relevant pages to make selective dump of them for
>>> you, using method proposed by Peter?
>>
>> It's a bit hard without knowing which index pages are affected. But the
>> attached script (dump-gin-page-info.sql) dumps some summary information
>> about the kind of pages there are. That's a start.
>
> I'm attaching output of the proposed commands.
Thanks!
>> Let's try one thing first: I rebased a patch to add GIN support to the
>> amcheck extension [1]. I kind of doubt that it will catch the corruption
>> in this case, but you never know. So please apply the attached
>> v2-0001-Amcheck-for-GIN-13stable.patch patch. It is the same patch I
>> posted at [1], but the installation script is slightly different to make
>> it work on PostgreSQL v13. Then compile install it:
>>
>> cd contrib/amcheck; make install
>>
>> Then in psql:
>>
>> CREATE EXTENSION amcheck;
>> gin_index_parent_check('entity_attribute_name_ids_gin');
>
> Sadly i'm getting this when executing CREATE EXTENSION amcheck;
>
> ERROR: extension "amcheck" has no installation script nor update path
> for version "1.2.1"
Hmm, that's odd, it worked for me. Did "make install" copy the
"amcheck--1.2--1.2.1.sql" file to the right place? It should be
installed the same directory as "amcheck.control".
Anyway, let's try a different tack. Here's another debugging patch. It
causes a LOG message to be printed whenever a GIN index page is
accessed, like this:
2021-07-15 21:22:20.119 EEST [972708] LOG: ReadBuffer 1663/12678/27035
read gin blk 0 (ginget.c:1832 scanPendingInsert)
2021-07-15 21:22:20.119 EEST [972708] STATEMENT: explain analyze select
* from test_intarray where a @> '{2}';
2021-07-15 21:22:20.119 EEST [972708] LOG: ReadBuffer 1663/12678/27035
read gin blk 1 (ginbtree.c:89 ginFindLeafPage)
2021-07-15 21:22:20.119 EEST [972708] STATEMENT: explain analyze select
* from test_intarray where a @> '{2}';
Please apply this patch, run the query again, and copy the resulting
log. I'm also interested in the contents of the pages that are accessed,
so if you could then run these commands to dump the contents of those
index pages:
# extract just the block numbers from the log
cat <path to postgres log> | perl -ne '/.*read gin blk (\d+)/ && print
"$1\n" ' | sort -n |uniq > /tmp/blocknums
# dump contents of each of those index pages to '/tmp/block-contents'
psql postgres -c "create temp table blocknums(blknum int); copy
blocknums from '/tmp/blocknums'; copy (select blknum,
get_raw_page('entity_attribute_name_ids_gin', blknum) from blocknums) to
'/tmp/block-contents';"
Please send over the resulting PostgreSQL log, and
'/tmp/block-contents', and I'll take a closer look into the contents to
see if there's something funny. (A private email is OK if you don't feel
comfortable sharing it with the world).
One more thing, please also add the 'ctid' virtual column to the query,
like this:
SELECT ctid, entity_id FROM entity WHERE
( attribute_name_ids && '{1737}' ) AND NOT ( (attribute_name_ids||0)
&& '{1737}') LIMIT 10;
Also, did you happen to keep the WAL log from around the time that the
query started producing wrong results? That would also be useful
evidence for how it got into this situation.
Thanks in advance! We might need a few more round-trips, but we'll get
there.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
trace-gin-readbuffer-2.patch | text/x-patch | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-07-15 19:09:55 | Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size |
Previous Message | Jeff Janes | 2021-07-15 16:11:07 | Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size |