From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) |
Date: | 2016-05-10 07:05:40 |
Message-ID: | 20160510070540.lyvintwdiw4ogmnw@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:
> I've bisected the errors I was seeing, discussed in
> http://www.postgresql.org/message-id/CAMkU=1xQEhC0Ok4d+tkjFQ1nvUhO37PYRKhJP6Q8oxifMx7OwA@mail.gmail.com
>
> It look like they first appear in:
>
> commit 48354581a49c30f5757c203415aa8412d85b0f70
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: Sun Apr 10 20:12:32 2016 -0700
>
> Allow Pin/UnpinBuffer to operate in a lockfree manner.
>
>
> I get the errors:
>
> ERROR: attempted to delete invisible tuple
> STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2
>
> And also:
>
> ERROR: unexpected chunk number 1 (expected 2) for toast value
> 85223889 in pg_toast_16424
> STATEMENT: update foo set count=count+1 where text_array @> $1
I'm a bit dumbfounded here. I think I found the issue, and, if so, it's
*very* longstanding. HeapTupleSatisfiesToast(), aborted transactions,
and hint bits (or oid wraparound) appear to be a broken combination.
What I'm seing is that tuptoaster.c is trying to fetch or delete toast
chunks generated in an aborted (via crash) transaction. And that, with
the same valueid/chunk_id, a valid value exist somewhere else in the
toast table.
HeapTupleSatisfiesToast() does not do any actual tuple liveliness
detection besides checking basic hint bits. Thus, if there's a dead
tuple in the toast table's heap, which a *live* reference from the
table, toast_fetch_datum() potentially returns the wrong results.
That makes the new question: How can there be a live reference to a dead
and reused toast id? As far as I can see the primary problem is that
GetNewOidWithIndex() uses SnapshotDirty to verify whether a tuple is
live. As that actually verifies tuple liveliness (and sets hint bits),
it'll skip over an aborted toasted datum. A valid question is why the
hint bits set by HeapTupleSatisfiesDirty() doesn't prevent that from
occuring (besides standbys which aren't evolved): It's the constant
crashes. Looking at a tuple which triggers the "unexpected chunk
number" error, it indeed has a live and a dead version of the same chunk
id.
Interestingly the dead rows looks like:
chunk_id = 1 - t_infomask: 0xA02 - HEAP_XMAX_INVALID | HEAP_XMIN_INVALID | HEAP_HASVARWIDTH
chunk_id = 2 - t_infomask: 0x802 - HEAP_XMAX_INVALID | HEAP_HASVARWIDTH
My rather strong suspicion on why that is, is that both tuples were
visited by SnapshotDirty(), but only the first ones hint bits got
persisted, because checksums were enabled. The second time through
MarkBufferDirtyHint skipped work, because the buffer was already dirty.
rmgr: XLOG len (rec/tot): 0/ 3809, tx: 206127725, lsn: 1C/EB3B0910, prev 1C/EB3B08E0, desc: FPI_FOR_HINT , blkref #0: rel 1663/16384/16433 blk 202567 FPW
that's indeed the only time the affected page is touched as visible to
pg_xlogdump after being filled...
The easy way to trigger this problem would be to have an oid wraparound
- but the WAL shows that that's not the case here. I've not figured
that one out entirely (and won't tonight). But I do see WAL records
like:
rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID 4302693
rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693
i.e. two NEXTOID records allocating the same range, which obviously
doesn't seem right. There's also every now and then close by ranges:
rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 1/9A404DB8, prev 1/9A404270, desc: NEXTOID 3311455
rmgr: XLOG len (rec/tot): 4/ 30, tx: 7814505, lsn: 1/9A4EC888, prev 1/9A4EB9D0, desc: NEXTOID 3311461
As far as I can see something like the above, or an oid wraparound, are
pretty much deadly for toast.
Is anybody ready with a good defense for SatisfiesToast not doing any
actual liveliness checks?
Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2016-05-10 07:56:50 | Re: Use %u to print user mapping's umid and userid |
Previous Message | konstantin knizhnik | 2016-05-10 07:00:03 | Re: asynchronous and vectorized execution |