Re: heapgetpage() and ->takenDuringRecovery

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: heapgetpage() and ->takenDuringRecovery
Date: 2014-03-03 13:33:38
Message-ID: 20140303133338.GD23352@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
> > where PD_ALL_VISIBLE is set:
> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
> > PD_ALL_VISIBLE/the vm is touched and that causes recovery
> > conflicts. The heap page is locked for cleanup at that point. As the
> > logging of xl_heap_clean sets the page's LSN there's no way the page
> > can appear on the standby too early.
> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
> > no need for conflicts. The only other way I can see to end up there
> > is a previous heap_page_prune() that repaired fragmentation. But that
> > logs a WAL record with conflict information.
>
> I don't think there's any reason to believe that lazy_scan_heap() can
> only hit pages that are empty or have just been defragged. Suppose
> that there's a tuple on the page which was recently inserted; the
> inserting transaction has committed but there are some backends that
> still have older snapshots. The page won't be marked all-visible
> because it isn't. Now, eventually those older snapshots will go away,
> and sometime after that the relation will get vacuumed again, and
> we'll once again look the page. But this time we notice that it is
> all-visible, and mark it so.

Right now I am missing how this isn't an actual correctness problem
after a crash. Without an LSN interlock we could crash *after* the heap
page has been written out, but *before* the vm WAL record has been
flushed to disk. Combined with synchronous_commit=off there could be
transactions that appeared as safely committed for vacuum (i.e. are
below GetOldestXmin()), but which are actually aborted after the
commit.
Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
but that doesn't work here.

Am I missing something?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2014-03-03 14:10:30 Re: Triggers on foreign tables
Previous Message Andrew Dunstan 2014-03-03 13:32:27 Re: Securing "make check" (CVE-2014-0067)