From: | Pawel Kudzia <kudzia(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date: | 2021-07-16 07:06:59 |
Message-ID: | CAJYBUS9_VUUYfAen+_pWGa4xxRR-2uvX+_Ycg6W+NqBXVwLY2Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Jul 15, 2021 at 8:49 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 15/07/2021 12:38, Pawel Kudzia wrote:
> >>> How can I identify relevant pages to make selective dump of them for
> >>> you, using method proposed by Peter?
> >>
> >> It's a bit hard without knowing which index pages are affected. But the
> >> attached script (dump-gin-page-info.sql) dumps some summary information
> >> about the kind of pages there are. That's a start.
> >
> > I'm attaching output of the proposed commands.
>
> Thanks!
>
> >> Let's try one thing first: I rebased a patch to add GIN support to the
> >> amcheck extension [1]. I kind of doubt that it will catch the corruption
> >> in this case, but you never know. So please apply the attached
> >> v2-0001-Amcheck-for-GIN-13stable.patch patch. It is the same patch I
> >> posted at [1], but the installation script is slightly different to make
> >> it work on PostgreSQL v13. Then compile install it:
> >>
> >> cd contrib/amcheck; make install
> >>
> >> Then in psql:
> >>
> >> CREATE EXTENSION amcheck;
> >> gin_index_parent_check('entity_attribute_name_ids_gin');
> >
> > Sadly i'm getting this when executing CREATE EXTENSION amcheck;
> >
> > ERROR: extension "amcheck" has no installation script nor update path
> > for version "1.2.1"
>
> Hmm, that's odd, it worked for me. Did "make install" copy the
> "amcheck--1.2--1.2.1.sql" file to the right place? It should be
> installed the same directory as "amcheck.control".
>
>
thank you for the hint. i've messed up patching. now i can get one step
further:
data=# CREATE EXTENSION amcheck;
CREATE EXTENSION
data=# gin_index_parent_check('entity_attribute_name_ids_gin');
ERROR: syntax error at or near "gin_index_parent_check"
LINE 1: gin_index_parent_check('entity_attribute_name_ids_gin');
^
i've also applied trace-gin-readbuffer-2.patch
> Anyway, let's try a different tack. Here's another debugging patch. It
> causes a LOG message to be printed whenever a GIN index page is
> accessed, like this:
>
> 2021-07-15 21:22:20.119 EEST [972708] LOG: ReadBuffer 1663/12678/27035
> read gin blk 0 (ginget.c:1832 scanPendingInsert)
> 2021-07-15 21:22:20.119 EEST [972708] STATEMENT: explain analyze select
> * from test_intarray where a @> '{2}';
> 2021-07-15 21:22:20.119 EEST [972708] LOG: ReadBuffer 1663/12678/27035
> read gin blk 1 (ginbtree.c:89 ginFindLeafPage)
> 2021-07-15 21:22:20.119 EEST [972708] STATEMENT: explain analyze select
> * from test_intarray where a @> '{2}';
>
> Please apply this patch, run the query again, and copy the resulting
> log.
that'll be the file called "log", send directly to your e-mail address.
> I'm also interested in the contents of the pages that are accessed,
> so if you could then run these commands to dump the contents of those
> index pages:
>
> # extract just the block numbers from the log
> cat <path to postgres log> | perl -ne '/.*read gin blk (\d+)/ && print
> "$1\n" ' | sort -n |uniq > /tmp/blocknums
>
> # dump contents of each of those index pages to '/tmp/block-contents'
> psql postgres -c "create temp table blocknums(blknum int); copy
> blocknums from '/tmp/blocknums'; copy (select blknum,
> get_raw_page('entity_attribute_name_ids_gin', blknum) from blocknums) to
> '/tmp/block-contents';"
>
done, side note - i had to run "create extension pageinspect" and provide
name of my database - "data" rather than "postgres".
Please send over the resulting PostgreSQL log, and
> '/tmp/block-contents', and I'll take a closer look into the contents to
> see if there's something funny. (A private email is OK if you don't feel
> comfortable sharing it with the world).
>
> One more thing, please also add the 'ctid' virtual column to the query,
> like this:
>
> SELECT ctid, entity_id FROM entity WHERE
> ( attribute_name_ids && '{1737}' ) AND NOT ( (attribute_name_ids||0)
> && '{1737}') LIMIT 10;
>
that's how the queries above were run. output:
ctid | entity_id
--------------+-----------
(4002784,1) | 38048120
(4002869,14) | 95333744
(2 rows)
> Also, did you happen to keep the WAL log from around the time that the
> query started producing wrong results? That would also be useful
> evidence for how it got into this situation.
>
sadly i don't. as noted earlier - we've already switched the production
setup so i have no way of leading to corruption again, i only have two
file-level backups of the database with two different inconsistencies in
the SELECTs.
greetings!
--
regards,
Pawel Kudzia
From | Date | Subject | |
---|---|---|---|
Next Message | Sergei Kornilov | 2021-07-16 07:53:03 | Re: BUG #17111: Database created, cannot be created, but reported as inexist |
Previous Message | Alvaro Herrera | 2021-07-16 02:49:11 | Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size |