From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hot standby, recovery infra |
Date: | 2009-02-05 19:54:50 |
Message-ID: | 498B440A.1030101@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Ok, here's another version. Major changes since last patch:
- Startup checkpoint is now again performed after the recovery is
finished, before allowing (read-write) connections. This is because we
couldn't solve the problem of re-entering recovery after a crash before
the first online checkpoint.
- minSafeStartPoint is gone, and its functionality has been folded into
minRecoveryPoint. It was really the same semantics. There might have
been some debugging value in keeping the backup stop time around, but
it's in the backup label file in the base backup anyway.
- minRecoveryPoint is now updated in XLogFlush, instead of when a file
is restored from archive.
- log_restartpoints is gone. Use log_checkpoints in postgresql.conf
instead
Outstanding issues:
- If bgwriter is performing a restartpoint when recovery ends, the
startup checkpoint will be queued up behind the restartpoint. And since
it uses the same smoothing logic as checkpoints, it can take quite some
time for that to finish. The original patch had some code to hurry up
the restartpoint by signaling the bgwriter if
LWLockConditionalAcquire(CheckPointLock) fails, but there's a race
condition with that if a restartpoint starts right after that check. We
could let the bgwriter do the checkpoint too, and wait for it, but
bgwriter might not be running yet, and we'd have to allow bgwriter to
write WAL while disallowing it for all other processes, which seems
quite complex. Seems like we need something like the
LWLockConditionalAcquire approach, but built into CreateCheckPoint to
eliminate the race condition
- If you perform a fast shutdown while startup process is waiting for
the restore command, startup process sometimes throws a FATAL error
which leads escalates into an immediate shutdown. That leads to
different messages in the logs, and skipping of the shutdown
restartpoint that we now otherwise perform.
- It's not clear to me if the rest of the xlog flushing related
functions, XLogBackgroundFlush, XLogNeedsFlush and XLogAsyncCommitFlush,
need to work during recovery, and what they should do.
I'll continue working on those outstanding items.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
recovery-infra-41d3bcb.patch | text/x-diff | 63.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2009-02-05 20:16:09 | Re: Hot standby, recovery infra |
Previous Message | Tom Lane | 2009-02-05 17:37:13 | Re: Fixing Grittner's planner issues |