From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: ProcessStandbyHSFeedbackMessage can make global xmin go backwards |
Date: | 2011-10-20 22:20:19 |
Message-ID: | 29674.1319149219@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can
> call GetOldestXmin and then the result will politely hold still while
> it considers what to do next. But in fact, whoever has the oldest xmin
> could exit their transaction, allowing the global minimum to advance.
> If a VACUUM process then inspects the ProcArray, it could compute an
> oldest xmin that is newer than the value that
> ProcessStandbyHSFeedbackMessage installs just afterwards. So much
> for keeping the data the standby wanted.
Actually, it's worse than that. Clamping the walsender's xmin to
GetOldestXmin() is useless, and if it weren't useless, this code would
be completely wrong even discounting the bugs I pointed out already.
Consider what it is we're trying to do here: we'd like to prevent
VACUUMs on the master from deleting dead rows with xmax newer than the
xmin the standby is requesting. Setting the walsender's xmin will only
affect VACUUMs launched after that moment; anything that's already
running will have already determined its threshold xmin, and perhaps
already removed rows. You can't retroactively fix that. So clamping
the walsender's xmin to GetOldestXmin doesn't actually do anything for
data integrity; it only ensures that the value of GetOldestXmin doesn't
go backwards on successive calls. But that's possible anyway, as the
comments for that function already note.
What's worse is that the only thing that is prevented from going
backwards is the value of GetOldestXmin with allDbs = true. But that's
only used by vacuums on shared catalogs. If we wanted to prevent
the values seen by vacuums on non-shared tables from going backwards,
we'd have to do some much-more-complex calculation that identified
the largest value of GetOldestXmin with allDbs = false, across all
databases. And that would *still* be wrong, because then the walsender
would only be protecting data that is newer than the newest such value,
which might allow data in other databases to be reclaimed too early.
You could only really make this work by maintaining per-database xmins,
which the standby isn't sending and there's no place to put into the
walsender's ProcArray entry either.
So I've concluded that there's just no point in the GetOldestXmin
clamping, and we should just apply the xmin value we get from the
standby if it passes basic sanity checks (the epoch test), and hope that
we're not too late to prevent loss of the data the standby wanted.
This line of reasoning also suggests that it's a pretty bad idea for the
walsender to invalidate its existing xmin if it gets a transiently bogus
(out of epoch) value from the standby. I think it should sit on the
xmin it has, instead, to avoid potential loss of needed data.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2011-10-21 01:07:04 | Re: ProcessStandbyHSFeedbackMessage can make global xmin go backwards |
Previous Message | Dan Ports | 2011-10-20 21:55:29 | Re: SSI implementation question |