From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal for 9.1: WAL streaming from WAL buffers |
Date: | 2010-06-21 09:40:02 |
Message-ID: | 4C1F3372.2090202@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21/06/10 12:08, Fujii Masao wrote:
> On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> In 9.0, I think we can fix this problem by (1) only streaming WAL that
>> has been fsync'd and (2) PANIC-ing if the problem occurs anyway. But
>> in 9.1, with sync rep and the performance demands that entails, I
>> think that we're going to need to rethink it.
>
> The problem is not that the master streams non-fsync'd WAL, but that the
> standby can replay that. So I'm thinking that we can send non-fsync'd WAL
> safely if the standby makes the recovery wait until the master has fsync'd
> WAL. That is, walsender sends not only non-fsync'd WAL but also WAL flush
> location to walreceiver, and the standby applies only the WAL which the
> master has already fsync'd. Thought?
I guess, but you have to be very careful to correctly refrain from
applying the WAL. For example, a naive implementation might write the
WAL to disk in walreceiver immediately, but refrain from telling the
startup process about it. If walreceiver is then killed because the
connection is broken (and it will be because the master just crashed),
the startup process will read the streamed WAL from the file in pg_xlog,
and go ahead to apply it anyway.
So maybe there's some room for optimization there, but given the
round-trip required for the acknowledgment anyway it might not buy you
much, and the implementation is not very straightforward. This is
clearly 9.1 material, if worth optimizing at all.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-06-21 11:11:56 | Re: beta3 & the open items list |
Previous Message | Thom Brown | 2010-06-21 09:19:49 | Re: Using multidimensional indexes in ordinal queries |