From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PageIsAllVisible()'s trustworthiness in Hot Standby |
Date: | 2012-12-04 13:38:48 |
Message-ID: | CA+TgmoY=n70HT4SgxZjj-YCr8NpR4pSXzfQ5dUD8m8rXC629Mg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 4, 2012 at 8:08 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
> I was looking at the following code in heapam.c:
>
> 261 /*
> 262 * If the all-visible flag indicates that all tuples on the page
> are
> 263 * visible to everyone, we can skip the per-tuple visibility tests.
> But
> 264 * not in hot standby mode. A tuple that's already visible to all
> 265 * transactions in the master might still be invisible to a
> read-only
> 266 * transaction in the standby.
> 267 */
> 268 all_visible = PageIsAllVisible(dp) &&
> !snapshot->takenDuringRecovery;
>
> Isn't the check for !snapshot->takenDuringRecovery redundant now in master
> or whenever since we added crash-safety for VM ? In fact, this comment made
> me think if we are really handling index-only scans correctly or not on the
> Hot Standby. But apparently we are by forcing conflicting transactions to
> abort before redoing VM bit set operation on the standby. The same mechanism
> should protect us against the above case. Now I concede that the entire
> magic around setting and clearing the page level all-visible bit and the VM
> bit and our ability to keep them in sync is something I don't fully
> understand, but I see that every operation that sets the page level
> PD_ALL_VISIBLE flag also sets the VM bit while holding the buffer lock and
> emits a WAL record. So AFAICS the conflict resolution logic will take care
> of the above too.
I wasn't sure whether that could be safely changed. There's a subtle
distinction here: the PD_ALL_VISIBLE bit isn't the same as the
visibility map bit. And, technically, the WAL record only fully
protects the setting of *the visibility map bit* not the
PD_ALL_VISIBLE page-level bit. The purpose of that WAL logging is to
make sure that the page-level bit is never clear while the
visibility-map bit is set; it does not guarantee that the page-level
bit can never be set without issuing a WAL record. So, for example,
it's perfectly possible for a crash on the master might leave the
page-level bit set while the VM bit is clear. Now, if that page
somehow makes its way to the standby - via a base backup or a
full-page image - before the tuples it contains are all-visible
according to the standby's xmin horizon, we've got a problem. Can
that happen? It seems unlikely, but can we prove it's not possible?
Perhaps, but I wasn't sure.
Index-only scans are safe, because they're looking at the visibility
map itself, not the page-level bit, but the analysis is a little
murkier for sequential scans.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2012-12-04 13:40:47 | Re: Bug in buildfarm client |
Previous Message | Christian Ullrich | 2012-12-04 13:35:48 | Re: Bug in buildfarm client |