From: | Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [BUGS] BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions |
Date: | 2015-06-26 13:50:41 |
Message-ID: | 558D58B1.70400@2ndquadrant.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Il 26/06/15 15:43, marco(dot)nenciarini(at)2ndquadrant(dot)it ha scritto:
> The following bug has been logged on the website:
>
> Bug reference: 13473
> Logged by: Marco Nenciarini
> Email address: marco(dot)nenciarini(at)2ndquadrant(dot)it
> PostgreSQL version: 9.4.4
> Operating system: all
> Description:
>
> = Symptoms
>
> Let's have a simple master -> standby setup, with hot_standby_feedback
> activated,
> if a backend on standby is holding the cluster xmin and the master runs a
> VACUUM FREEZE
> on the same database of the standby's backend, it will generate a conflict
> and the query
> running on standby will be canceled.
>
> = How to reproduce it
>
> Run the following operation on an idle cluster.
>
> 1) connect to the standby and simulate a long running query:
>
> select pg_sleep(3600);
>
> 2) connect to the master and run the following script
>
> create table t(id int primary key);
> insert into t select generate_series(1, 10000);
> vacuum freeze verbose t;
> drop table t;
>
> 3) after 30 seconds the pg_sleep query on standby will be canceled.
>
> = Expected output
>
> The hot standby feedback should have prevented the query cancellation
>
> = Analysis
>
> Ive run postgres at DEBUG2 logging level, and I can confirm that the vacuum
> correctly see the OldestXmin propagated by the standby through the hot
> standby feedback.
> The issue is in heap_xlog_freeze function, which calls
> ResolveRecoveryConflictWithSnapshot as first thing, passing the cutoff_xid
> value as first argument.
> The cutoff_xid is the OldestXmin active when the vacuum, so it represents a
> running xid.
> The issue is that the function ResolveRecoveryConflictWithSnapshot expects
> as first argument of is latestRemovedXid, which represent the higher xid
> that has been actually removed, so there is an off-by-one error.
>
> I've been able to reproduce this issue for every version of postgres since
> 9.0 (9.0, 9.1, 9.2, 9.3, 9.4 and current master)
>
> = Proposed solution
>
> In the heap_xlog_freeze we need to subtract one to the value of cutoff_xid
> before passing it to ResolveRecoveryConflictWithSnapshot.
>
>
>
Attached a proposed patch that solves the issue.
Regards,
Marco
--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it
Attachment | Content-Type | Size |
---|---|---|
hs_freeze_offby1.v1.patch | text/plain | 651 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-06-26 13:53:42 | Re: BUG #13472: VACUUM ANALYZE hangs on certain tables |
Previous Message | marco.nenciarini | 2015-06-26 13:43:10 | BUG #13473: VACUUM FREEZE mistakenly cancel standby sessions |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-06-26 13:53:30 | Re: Should we back-patch SSL renegotiation fixes? |
Previous Message | Andres Freund | 2015-06-26 13:49:24 | Re: Nitpicking: unnecessary NULL-pointer check in pg_upgrade's controldata.c |