From: | marco(dot)nenciarini(at)2ndquadrant(dot)it |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions |
Date: | 2015-06-26 13:43:10 |
Message-ID: | 20150626134310.3876.82768@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
The following bug has been logged on the website:
Bug reference: 13473
Logged by: Marco Nenciarini
Email address: marco(dot)nenciarini(at)2ndquadrant(dot)it
PostgreSQL version: 9.4.4
Operating system: all
Description:
= Symptoms
Let's have a simple master -> standby setup, with hot_standby_feedback
activated,
if a backend on standby is holding the cluster xmin and the master runs a
VACUUM FREEZE
on the same database of the standby's backend, it will generate a conflict
and the query
running on standby will be canceled.
= How to reproduce it
Run the following operation on an idle cluster.
1) connect to the standby and simulate a long running query:
select pg_sleep(3600);
2) connect to the master and run the following script
create table t(id int primary key);
insert into t select generate_series(1, 10000);
vacuum freeze verbose t;
drop table t;
3) after 30 seconds the pg_sleep query on standby will be canceled.
= Expected output
The hot standby feedback should have prevented the query cancellation
= Analysis
Ive run postgres at DEBUG2 logging level, and I can confirm that the vacuum
correctly see the OldestXmin propagated by the standby through the hot
standby feedback.
The issue is in heap_xlog_freeze function, which calls
ResolveRecoveryConflictWithSnapshot as first thing, passing the cutoff_xid
value as first argument.
The cutoff_xid is the OldestXmin active when the vacuum, so it represents a
running xid.
The issue is that the function ResolveRecoveryConflictWithSnapshot expects
as first argument of is latestRemovedXid, which represent the higher xid
that has been actually removed, so there is an off-by-one error.
I've been able to reproduce this issue for every version of postgres since
9.0 (9.0, 9.1, 9.2, 9.3, 9.4 and current master)
= Proposed solution
In the heap_xlog_freeze we need to subtract one to the value of cutoff_xid
before passing it to ResolveRecoveryConflictWithSnapshot.
From | Date | Subject | |
---|---|---|---|
Next Message | Marco Nenciarini | 2015-06-26 13:50:41 | Re: [BUGS] BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions |
Previous Message | matthew.seaman | 2015-06-26 11:08:30 | BUG #13472: VACUUM ANALYZE hangs on certain tables |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2015-06-26 13:43:26 | Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule? |
Previous Message | Robert Haas | 2015-06-26 13:42:48 | Re: Schedule for 9.5alpha1 |