From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Reworking the writing of WAL |
Date: | 2011-08-12 15:34:21 |
Message-ID: | CA+U5nM+3f4yOrR39MqAMXQSsxn28JXzyU8AsL9O8qNL+8NANtg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I present a number of connected proposals
1. Earlier, I suggested that the sync rep code would allow us to
redesign the way we write WAL, using ideas from group commit. My
proposal is that when when a backend needs to flush WAL to local disk
it will be added to a SHMQUEUE exactly the same as when we flush WAL
to sync standby. The WALWriter will be woken by latch and then perform
the actual work. When complete WALWriter will wake the queue in order,
so there is a natural group commit effect. The WAL queue will be
protected by a new lock WALFlushRequestLock, which should be much less
heavily contended than the way we do things now. Notably this approach
will mean that all waiters get woken quickly, without having to wait
for the queue of WALWriteLock requests to drain down, so commit will
be marginally quicker. On almost idle systems this will give very
nearly the same response time as having each backend write WAL
directly. On busy systems this will give optimal efficiency by having
WALWriter working in a very tight loop to perform the I/O instead of
queuing itself to get the WALWriteLock with all the other backends. It
will also allow piggybacking of commits even when WALInsertLock is not
available.
2. A further idea is to use the same queue to reduce contention on
accessing the ProcArray and Clog at end of transaction also. That
would not be part of the initial work, but I'd want to bear in mind
that possibility in the design stage at least if there were any
choices to make.
3. In addition, we will send the WAL to standby servers as soon as it
has been written, not flushed. As part of the chunk header the
WALSender would include the known WAL flush ptr. So we would be
sending WAL data to the standby ahead of it being flushed, but then
only applying data up the flush ptr. This would mean we don't flush
WAL fully and then send it, we partially overlap those operations to
give us the option of saying we don't want to fsync remotely for
additional speed (DRBD 'B' mode).
4. I'm tempted by the thought to make backends write their commit
records but not flush them, which fits in with the above.
5. And we would finally get rid of the group commit parameters.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Urbański | 2011-08-12 16:29:02 | Re: plpython crash |
Previous Message | Dave Byrne | 2011-08-12 15:16:18 | Re: Possible Bug in pg_upgrade |