From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Concurrency bug with vacuum full (cluster) and toast |
Date: | 2019-03-18 16:53:22 |
Message-ID: | CAPpHfdu3PJUzHpQrvJ5RC9bEX_bZ6LwX52kBpb0EiD_9e3Np5g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
I've discovered bug, when vacuum full fails with error, because it
couldn't find toast chunks deleted by itself. That happens because
cluster_rel() sets OldestXmin, but toast accesses gets snapshot later
and independently. That causes heap_page_prune_opt() to clean chunks,
which rebuild_relation() expects to exist. This bug very rarely
happens on busy systems which actively update toast values. But I
found way to reliably reproduce it using debugger.
*Setup*
CREATE FUNCTION random_string(seed integer, length integer) RETURNS text
AS $$
SELECT substr(
string_agg(
substr(
encode(
decode(
md5(seed::text || '-' || i::text),
'hex'),
'base64'),
1, 21),
''),
1, length)
FROM generate_series(1, (length + 20) / 21) i; $$
LANGUAGE SQL;
CREATE TABLE test (val text);
INSERT INTO test (random_string(1,100000));
*Reproduction steps*
s1-s3 are three parallel PostgreSQL sessions
s3lldb is lldb connected to s1
At first s1 acquires snapshot and holds it.
s1# begin transaction isolation level repeatable read;
s1# select 1;
Then s2 makes multiple updates of our toasted value.
s2# update test set val = random_string(2,100000);
s2# update test set val = random_string(3,100000);
s2# update test set val = random_string(4,100000);
s2# update test set val = random_string(5,100000);
s2# update test set val = random_string(6,100000);
s2# update test set val = random_string(7,100000);
Then s3 starting vacuum full stopping on vacuum_set_xid_limits().
s3lldb# b vacuum_set_xid_limits
s3# vacuum full test;
We pass vacuum_set_xid_limits() making sure old tuple versions made by
s2 would be recently dead for vacuum full.
s3lldb# finish
Then s1 releases snapshot. Then heap_page_prune_opt() called from
toast accessed would cleanup toast chunks, which vacuum full expects
to be recently dead.
s1# commit;
Finally, we continue our vacuum full and get error!
s3lldb# continue
s3#
ERROR: unexpected chunk number 50 (expected 2) for toast value 16429
in pg_toast_16387
Attached patch contains dirty fix of this bug, which just prevents
heap_page_prune_opt() from clean tuples, when it's called from
rebuild_relation(). Actually, it's not something I'm proposing to
commit or even review, it might be just some start point for thoughts.
Any ideas?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
cluster_toast_concurrency_fix.patch | application/octet-stream | 4.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2019-03-18 17:02:52 | Re: Fix optimization of foreign-key on update actions |
Previous Message | Ildar Musin | 2019-03-18 16:11:01 | Re: [HACKERS] Custom compression methods |