Re: WAL Bypass for indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Martin Scholes <marty(at)iicolo(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Kings-Lynne <chris(dot)kings-lynne(at)calorieking(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WAL Bypass for indexes
Date: 2006-04-03 13:55:41
Message-ID: 7498.1144072541@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Thinking about this some more, I ask myself: why is it we log index
> inserts at all? We log heap inserts, which contain all the information
> we need to replay all index inserts also, so why bother?

(1) We can't run user-defined functions during log replay. Quite
aside from any risk of nondeterminism, the normal transaction
infrastructure isn't functioning in that environment.

(2) Some of the index code is itself deliberately nondeterministic.
I'm thinking in particular of the move-right-or-not choice in
_bt_insertonpg() when there are many equal keys, but randomization is
in general a useful algorithmic technique that we'd have to forswear.

(3) In the presence of concurrency, the sequence of heap-insert WAL
records isn't enough info, because it doesn't tell you what order the
index inserts occurred in. The btree code, at least, is sufficiently
concurrent that even knowing the sequence of leaf-key insertions isn't
full information --- it's not hard to imagine cases where decisions
about where to split upper-level pages are dependent on which process
manages to obtain lock on a page first.

There are probably some other reasons that I forgot. Check the
archives; this point has been debated before.

Basically the problem here is that you can't mix logged and non-logged
operations --- if you're going to WAL-log any operations on an index
then you have to be sure that the replay will regenerate exactly the
same series of index states that happened the first time. So none of
this is an argument against "rebuild the index at end of replay"; but
I don't see any workable half measures.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-04-03 14:28:50 Re: WAL Bypass for indexes
Previous Message Horváth Sándor 2006-04-03 13:48:47 deferrable check, trigger