From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments |
Date: | 2014-12-05 00:28:54 |
Message-ID: | 20141205002854.GE21964@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
We've recently observed a case where, after a promotion, a postgres
server suddenly started to archive a large amount of old WAL.
After some digging the problem is this:
pg_basebackup -X creates files in pg_xlog/ without creating the
corresponding .done file. Note that walreceiver *does* create them. The
standby in this case, just like the master, had a significant
wal_keep_segments. RemoveOldXlogFiles() then, during recovery restart
points, calls XLogArchiveCheckDone() which in turn does:
/* Retry creation of the .ready file */
XLogArchiveNotify(xlog);
return false;
if there's neither a .done nor a .ready file present and archive_mode is
enabled. These segments then aren't removed because there's a .ready
present and they're never archived as long as the node is a standby
because we don't do archiving on standbys.
Once the node is promoted archiver will be started and suddenly archive
all these files - which might be months old.
And additional, at first strange, nice detail is that a lot of the
.ready files had nearly the same timestamps. Turns out that's due to
wal_keep_segments. Initially RemoveOldXlogFiles() doesn't process the
files because they're newer than allowed due to wal_keep_segments. Then
every checkpoint a couple segments would be old enough to reach
XLogArchiveCheckDone() which then'd create the .ready marker... But not
all at once :)
So I think we just need to make pg_basebackup create to .ready
files. Given that the walreceiver and restore_command already
unconditionally do XLogArchiveForceDone() I think we'd follow the
established precedent. Arguably it could make sense to archive files
again on the standby after a promotion as they aren't guaranteed to have
been on the then primary. But we don't have any infrastructure anyway
for that and walsender doesn't do so, so it doesn't seem to make any
sense to do that for pg_basebackup.
Independent from this bug, there's also some debatable behaviour about
what happens if a node with a high wal_keep_segments turns on
archive_mode. Suddenly all those old files are archived... I think it
might be a good idea to simply always create .done files when
archive_mode is disabled while a wal segment is finished.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Matt Newell | 2014-12-05 00:30:19 | Re: libpq pipelining |
Previous Message | Michael Paquier | 2014-12-04 23:58:33 | Re: New wal format distorts pg_xlogdump --stats |