From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Heikki <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PANIC in GIN code |
Date: | 2015-06-29 16:20:11 |
Message-ID: | CAMkU=1zUZPhY+Dt6dy3YNqX8384RFk2Rj71bUm2_Nbz9wCG56w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 29, 2015 at 1:37 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 06/29/2015 01:12 AM, Jeff Janes wrote:
>
>> Now I'm getting a different error, with or without checksums.
>>
>> ERROR: invalid page in block 0 of relation base/16384/16420
>> CONTEXT: automatic vacuum of table "jjanes.public.foo"
>>
>> 16420 is the gin index. I can't even get the page with pageinspect:
>>
>> jjanes=# SELECT * FROM get_raw_page('foo_text_array_idx', 0);
>> ERROR: invalid page in block 0 of relation base/16384/16420
>>
>> This is the last few gin entries from pg_xlogdump
>>
>>
>> rmgr: Gin len (rec/tot): 0/ 3893, tx: 0, lsn:
>> 0/77270E90, prev 0/77270E68, desc: VACUUM_PAGE , blkref #0: rel
>> 1663/16384/16420 blk 27 FPW
>> rmgr: Gin len (rec/tot): 0/ 3013, tx: 0, lsn:
>> 0/77272080, prev 0/77272058, desc: VACUUM_PAGE , blkref #0: rel
>> 1663/16384/16420 blk 6904 FPW
>> rmgr: Gin len (rec/tot): 0/ 3093, tx: 0, lsn:
>> 0/77272E08, prev 0/77272DE0, desc: VACUUM_PAGE , blkref #0: rel
>> 1663/16384/16420 blk 1257 FPW
>> rmgr: Gin len (rec/tot): 8/ 4662, tx: 318119897, lsn:
>> 0/77A2CF10, prev 0/77A2CEC8, desc: INSERT_LISTPAGE , blkref #0: rel
>> 1663/16384/16420 blk 22184
>> rmgr: Gin len (rec/tot): 88/ 134, tx: 318119897, lsn:
>> 0/77A2E188, prev 0/77A2E160, desc: UPDATE_META_PAGE , blkref #0: rel
>> 1663/16384/16420 blk 0
>>
>
Another piece of info here that might be relevant. Almost all
UPDATE_META_PAGE xlog records other than the last one have two backup
blocks. The last UPDATE_META_PAGE record only has one backup block.
And the metapage is mostly zeros:
>>
>> head -c 8192 /tmp/data2_invalid_page/base/16384/16420 | od
>> 0000000 000000 000000 161020 073642 000000 000000 000000 000000
>> 0000020 000000 000000 000000 000000 053250 000000 053250 000000
>> 0000040 006140 000000 000001 000000 000001 000000 000000 000000
>> 0000060 031215 000000 000452 000000 000000 000000 000000 000000
>> 0000100 025370 000000 000000 000000 000002 000000 000000 000000
>> 0000120 000000 000000 000000 000000 000000 000000 000000 000000
>> *
>> 0020000
>>
>
> Hmm. Looking at ginRedoUpdateMetapage, I think I see the problem: it
> doesn't initialize the page. It copies the metapage data, but it doesn't
> touch the page headers. The only way I can see that that would cause
> trouble is if the index somehow got truncated away or removed in the
> standby. That could happen in crash recovery, if you drop the index and the
> crash, but that should be harmless, because crash recovery doesn't try to
> read the metapage, only update it (by overwriting it), and by the time
> crash recovery has completed, the index drop is replayed too.
>
> But AFAICS that bug is present in earlier versions too.
Yes, I did see this error reported previously but it was always after the
first appearance of the PANIC, so I assumed it was a sequella to that and
didn't investigate it further at that time.
> Can you reproduce this easily? How?
I can reproduce it fairly easy.
I apply the attached patch and compile with enable-casssert (full list
'--enable-debug' '--with-libxml' '--with-perl' '--with-python'
'--with-ldap' '--with-openssl' '--with-gssapi'
'--prefix=/home/jjanes/pgsql/torn_bisect/' '--enable-cassert')
Then edit do.sh to point to the data directory and installation directory
you want, and run that. It calls count.pl from the same directory. I
started getting the errors after about 10 minutes on a 8 core Intel(R)
Xeon(R) CPU E5-2650 0 @ 2.00GHz.
sh do.sh >& do_cassert_fix.out2 &
The output is quite a mess, mingling logfile from PostgreSQL and from Perl
together. Since I already know what I'm looking for, I use:
tail -f do_cassert_fix.out2 |fgrep ERROR
Cheers,
Jeff
Attachment | Content-Type | Size |
---|---|---|
do.sh | text/x-sh | 4.3 KB |
count.pl | text/x-perl | 8.3 KB |
crash_REL9_5.patch | text/x-diff | 11.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2015-06-29 17:02:10 | Re: Reduce ProcArrayLock contention |
Previous Message | Tatsuo Ishii | 2015-06-29 15:39:56 | Re: Oh, this is embarrassing: init file logic is still broken |