Better handling of archive_command problems

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Better handling of archive_command problems
Date: 2013-05-13 22:02:16
Message-ID: CAM3SWZQcyNxvPaskr-pxm8DeqH7_qevW7uqbhPCsg1FpSxKpoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The documentation says of continuous archiving:

"While designing your archiving setup, consider what will happen if
the archive command fails repeatedly because some aspect requires
operator intervention or the archive runs out of space. For example,
this could occur if you write to tape without an autochanger; when the
tape fills, nothing further can be archived until the tape is swapped.
You should ensure that any error condition or request to a human
operator is reported appropriately so that the situation can be
resolved reasonably quickly. The pg_xlog/ directory will continue to
fill with WAL segment files until the situation is resolved. (If the
file system containing pg_xlog/ fills up, PostgreSQL will do a PANIC
shutdown. No committed transactions will be lost, but the database
will remain offline until you free some space.)"

I think that it is not uncommon for archiving to fall seriously
behind, risking a serious loss of availability. When this happens, the
DBA has to fight against the clock to fix whatever problem there is
with continuous archiving, hoping to catch up and prevent a PANIC
shutdown. This is a particularly unpleasant problem to have.

At Heroku, we naturally monitor the state of continuous archiving on
all clusters under our control. However, when faced with this
situation, sometimes the least-worst option to buy time is to throttle
Postgres using a crude mechanism: issuing repeated SIGSTOP and SIGCONT
signals to all Postgres processes, with the exception of the archiver
auxiliary process. Obviously this is a terrible thing to have to do,
principally because it slows almost everything right down. It would be
far preferable to just slow down the writing of WAL segments when
these emergencies arise, since that alone is what risks causing a
PANIC shutdown when XLogWrite() cannot write WAL. Even if the pg_xlog
directory is on the same filesystem as database heap files, it is
obviously the case that throttling WAL will have the effect of
throttling operations that might cause those heap files to be
enlarged. Reads (including the writes that enable reads, like those
performed by the background writer and backends to clean dirty
buffers) and checkpointing are not affected (though of course
checkpointing does have to write checkpoint WAL records, so perhaps
not quite).

What I'd like to propose is that we simply sit on WALWriteLock for a
configured delay in order to throttle the writing (though not the
insertion) of WAL records. I've drafted a patch that does just that -
it has the WAL Writer optionally sleep on the WALWriteLock for some
period of time once per activity cycle (avoiding WAL Writer
hibernation). If this sounds similar to commit_delay, that's because
it is almost exactly the same. We just sleep within the WAL Writer
rather than a group commit leader backend because that doesn't depend
upon some backend hitting the XLogFlush()/commit_delay codepath. In a
bulk loading situation, it's perfectly possible for no backend to
actually hit XLogFlush() with any sort of regularity, so commit_delay
cannot really be abused to do what I describe here. Besides, right now
commit_delay is capped so that it isn't possible to delay for more
than 1/10th of a second.

What I've proposed here has the disadvantage of making activity rounds
of the WAL Writer take longer, thus considerably increasing the window
in which any asynchronous commits will actually make it out to disk.
However, that's a problem that's inherent with any throttling of WAL
Writing as described here (XLogBackgroundFlush() itself acquires
WalWriteLock anyway), so I don't imagine that there's anything that
can be done about that other than having a clear warning. I envisage
this feature as very much a sharp tool to be used by the DBA only when
they are in a very tight bind. Better to at least be able to handle
read queries in the event of having this problem, and to not throttle
longer running transactions with some writes that don't need to make
it out to disk right away. I also have a notion that we can usefully
throttle WAL writing less aggressively than almost or entirely
preventing it. I have an idea that a third party monitoring daemon
could scale up or down the throttling delay as a function of how full
the pg_xlog filesystem is. It might be better to modestly throttle WAL
writing for two hours in order to allow continuous archiving to catch
up, rather than sharply curtailing WAL writing for a shorter period.

Has anyone else thought about approaches to mitigating the problems
that arise when an archive_command continually fails, and the DBA must
manually clean up the mess?

--
Peter Geoghegan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-05-13 22:22:09 Re: local_preload_libraries logspam
Previous Message Simon Riggs 2013-05-13 21:36:53 Re: corrupt pages detected by enabling checksums