Re: Sync Rep: First Thoughts on Code

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep: First Thoughts on Code
Date: 2008-12-11 09:29:05
Message-ID: 4940DD61.7030000@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2008-12-11 at 09:44 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> When the WAL starts streaming the *primary* can immediately perform
>>> synchronous replication, i.e. commit waits for transfer.
>> Until the standby has obtained all the missing log files, it's not
>> up-to-date, and there's no guarantee that it can finish the replay. For
>> example, imagine that your archive_command is an scp from the primary to
>> the standby. If a lightning strikes the primary before some WAL file has
>> been copied over to the archive directory in the standby, the standby
>> can't catch up. In the primary then, what's the point for a commit to
>> wait for transfer, if the reply from the standby doesn't guarantee that
>> the transaction is safe in the standby?
>
> The WAL files will have already left the primary.
>
> Timeline is this in my understanding
> 1 [Primary] Set up continuous archiving
> 2 [Primary] Take base backup
> 3 [Standby] Connect to primary to initiate streaming
> 4 [Primary] Log switch and, optionally, turn off archiving
> 5 [Standby] Begin replaying files, initially from archive
> 6 [Standby] Switch to replaying WAL records immediately after streaming
>
> So sync rep would turn on after step 4, so that all intermediate WAL
> files have been sent to the archive. If we lose the Primary after this
> point then all transactions are accessible to standby. If we lose the
> Standby or Archive, then we need to replace them and re-run the above.

Between steps 4 and 5, there's no guarantee that all WAL files generated
after step 3 and the start of streaming have already been archived.
There's a delay between writing a WAL file and when the file has been
safely archived. If you lose the primary during that window, the standby
will have old WAL files in the archive, the most recent ones in received
by walreceiver, but it's missing the WAL files generated just before the
switch to streaming mode.

> Recent changes I have requested in the architecture are:
> * making archiving optional on primary, so we don't need to send WAL
> data *twice*.

Agreed. I'm not so much worried about the bandwidth, but it's a lot of
extra work from administration point of view. It's very hard to get it
right, so that you eliminate windows like the above.

As the patch stands, if you turn off archiving in the primary, and the
standby ever disconnects, even for only a few seconds, the standby will
miss any WAL generated until it reconnects, and without archiving
there's no way for the standby to get hold of the missed WAL.

> * allowing streaming/startup process to work together via shared memory,
> to reduce average replication delay and improve performance
> * skip archiving/de-archiving step on standby because it's superfluous
> (all on this thread)
>
> All of those are fairly minor code changes, but reduce complexity of
> solution and significantly reduce the amount of copying of WAL files (3
> copy actions to/from archive removed without loss of robustness). I
> would have made the suggestions earlier but it wasn't until I saw the
> architecture diagrams that I understood the intention of the code.

To make archiving optional in the primary, I don't see any other choice
than adding the capability for the standby to request arbitrary WAL
files from the primary, over the wire. That seems like a pretty
significant change to walsender: it needs to be able to read WAL not
only from wal_buffers, but from files. That would be a good idea for
performance reasons, too: currently if there's a network glitch and the
primary doesn't get acknowledgements from the standby for a short while,
XLogInserts in the primary will block waiting for the standby after
wal_buffers fills up. That's not a big deal for synchronous replication,
but in asynchronous mode you don't want network glitches like that to
stall the primary.

And of course it means changes in the startup code as well. And we'll
need bookkeeping in the primary of what WAL the standby has already
received, so that it doesn't recycle the WAL segments until they've been
sent to the standby. Or alternatively, the primary needs to be able to
retrieve segments from the archive, but then we're dependent on
archiving again.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Turin 2008-12-11 09:43:16 COCOMO & Indians
Previous Message Pavel Stehule 2008-12-11 08:58:49 Re: WIP: default values for function parameters