From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | atomic pin/unpin causing errors |
Date: | 2016-04-29 17:38:55 |
Message-ID: | CAMkU=1w85Dqt766AUrCnyqCXfZ+rsk1witAc_=v5+Pce93Sftw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've bisected the errors I was seeing, discussed in
http://www.postgresql.org/message-id/CAMkU=1xQEhC0Ok4d+tkjFQ1nvUhO37PYRKhJP6Q8oxifMx7OwA@mail.gmail.com
It look like they first appear in:
commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: Sun Apr 10 20:12:32 2016 -0700
Allow Pin/UnpinBuffer to operate in a lockfree manner.
I get the errors:
ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2
And also:
ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1
Once these errors start occurring, they happen often. Usually the
"attempted to delete invisible tuple" happens first.
These errors show up after about 9 hours of run time. The timing is
predictable enough that I don't think it is a purely stochastic race
condition. It seems like some counter variable is overflowing. But
it is not the ShmemVariableCache->nextXid counter, as I previously
speculated. This test does not advance that fast enough to for it to
wrap around within 9 hours of run time. But I am at a loss of what
other variable it might be. Since the system goes through a crash and
recovery every few seconds, any backend-local counters or
shared-memory counters would get reset upon recovery. Right?
I think the invisible tuple referred to might be a tuple in the toast
table, not in the parent table.
I don't see the problem with an cassert-enabled, probably because it
is just too slow to ever reach the point where the problem occurs.
Any suggestions about where or how to look? I don't know if the
"attempted to delete invisible tuple" is the bug itself, or is just
tripping over corruption left behind by someone else.
(This was all run using Teodor's test-enabling patch
gin_alone_cleanup-4.patch, so as not to change horses in midstream.
Now that a version of that patch has been committed, I will try to
repeat this in HEAD)
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2016-04-29 17:49:38 | Re: Replying to a pgsql-committers email by CC'ing hackers |
Previous Message | Alvaro Herrera | 2016-04-29 17:31:31 | Re: Add jsonb_compact(...) for whitespace-free jsonb to text |