Re: For what should pg_stop_backup wait?

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: For what should pg_stop_backup wait?
Date: 2008-08-08 11:57:14
Message-ID: 1218196634.4549.602.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Fri, 2008-08-08 at 11:47 +0900, Fujii Masao wrote:
> On Thu, Aug 7, 2008 at 11:34 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >

> In this situation, the history file (000000010000000000000004.00000020.backup)
> is behind the stoppoint (000000010000000000000004) in the alphabetic order.
> So, pg_stop_backup should wait for both the stoppoint and the history
> file, I think.

OK, I see that now.

>
> > ! while (!XLogArchiveCheckDone(stopxlogfilename, false))
>
> If a concurrent checkpoint removes the status file before XLogArchiveCheckDone,
> pg_stop_backup continues waiting forever. This is undesirable behavior.

I think it will only get removed by the second checkpoint, not the
first. So the risk of that happening seems almost certainly impossible.
But we'll put in a check just in case.

> Yes, statement_timeout may help. But, I don't want to use it, because the
> *successful* backup is canceled.
>
> How about checking whether the stoppoint was archived by comparing with
> the last WAL archived. The archiver process can tell the last WAL archived.
> Or, we can calculate it from the status file.

I think its easier to test whether the stopxlogfilename still exists in
pg_xlog. If not, we know it has been archived away. We can add that as
an extra condition inside the loop.

So thinking we should test XLogArchiveCheckDone() for both
stopxlogfilename and history file and then stat for the stop WAL file:

BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg,
startpoint.xrecoff % XLogSegSize);

seconds_before_warning = 60;
waits = 0;

while (!XLogArchiveCheckDone(histfilepath, false) ||
!XLogArchiveCheckDone(stopxlogfilename, false))
{
struct stat stat_buf;
char xlogpath[MAXPGPATH];

/*
* Check to see if file has already been archived and WAL file
* removed by a concurrent checkpoint
*/
snprintf(xlogpath, MAXPGPATH, XLOGDIR "/%s", stopxlogfilename);
if (XLogArchiveCheckDone(histfilepath, false) &&
stat(xlogpath, &stat_buf) != 0)
break;

CHECK_FOR_INTERRUPTS();

pg_usleep(1000000L);

if (++waits >= seconds_before_warning)
{
seconds_before_warning *= 2; /* This wraps in >10 years... */
elog(WARNING, "pg_stop_backup() waiting for archive to complete "
"(%d seconds delay)", waits);
}
}

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Volkan YAZICI 2008-08-08 12:00:40 Verbosity of Function Return Type Checks
Previous Message 崔岩ccuiyyan 2008-08-08 11:16:39 Oprofile with postgresql