Re: pg_basebackup -x stream from the standby gets stuck

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup -x stream from the standby gets stuck
Date: 2012-02-28 08:22:39
Message-ID: CAHGQGwGNtDu=Nezt00nVn=u4N=1g3NvxkGXHe3dfO6ffu-KRow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Hi,
>>
>> http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/
>>> =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication
>>> xlog start point: 2/AC4E2600
>>> pg_basebackup: starting background WAL receiver
>>> 692447/692447 kB (100%), 1/1 tablespace
>>> xlog end point: 2/AC4E2600
>>> pg_basebackup: waiting for background process to finish streaming...
>>> pg_basebackup: base backup completed
>>>
>>> real    3m56.237s
>>> user    0m0.224s
>>> sys     0m0.936s
>>>
>>> (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish)
>>
>> The above article points out the problem of pg_basebackup from the standby:
>> when "-x stream" is specified, pg_basebackup from the standby gets stuck if
>> there is no traffic in the database.
>>
>> When "-x stream" is specified, pg_basebackup forks the background process
>> for receiving WAL records during backup, takes an online backup and waits for
>> the background process to end. The forked background process keeps receiving
>> WAL records, and whenever it reaches end of WAL file, it checks whether it has
>> already received all WAL files required for the backup, and exits if yes. Which
>> means that at least one WAL segment switch is required for pg_basebackup with
>> "-x stream" option to end.
>>
>> In the backup from the master, WAL file switch always occurs at both start and
>> end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the
>> above logic works fine even if there is no traffic. OTOH, in the backup from the
>> standby, while there is no traffic, WAL file switch is not performed at all. So
>> in that case, there is no chance that the background process reaches end of WAL
>> file, check whether all required WAL arrives and exit. At the end, pg_basebackup
>> gets stuck.
>>
>> To fix the problem, I'd propose to change the background process so that it
>> checks whether all required WAL has arrived, every time data is received, even
>> if end of WAL file is not reached. Patch attached. Comments?
>
> This seems like a good thing in general.
>
> Why does it need to modify pg_receivexlog, though? I thought only
> pg_basebackup had tihs issue?
>
> I guess it is because of the change of the API to
> stream_continue_callback only?

Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c.

But the reason why I changed segment_callback() in pg_receivexlog.c is not the
same. I did that because previously segment_finish_callback is called
only at the
end of WAL segment but in the patch it can be called at the middle of segment.
OTOH, segment_callback() must emit a verbose message only when current
WAL segment is complete. So I had to add the check of whether current WAL
segment is partial or complete into segment_callback().

> Looking at it after your patch,
> stream_continue_callback and segment_finish_callback are the same.
> Should we perhaps just fold them into a single
> stream_continue_callback? Since you had to move the "detect segment
> end" to the caller anyway?

No. I think we cannot do that because in pg_receivexlog they are not the same.

> Another question related to this - since we clearly don't need the
> xlog switch in this case, should we make it conditional on the master
> as well, so we don't switch unnecessarily there as well?

Maybe. At the end of backup, we force WAL segment switch, to ensure all required
WAL files have been archived. So theoretically if WAL archiving is not enabled,
we can skip WAL segment switch. But some backup tools might depend on this
behavior....

In standby-only backup, we always skip WAL segment switch. So there is
no guarantee
that all WAL files required for the backup are archived at the end of
backup. This
limitation is documented.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru Hanada 2012-02-28 08:38:21 Re: FDW system columns
Previous Message Kyotaro HORIGUCHI 2012-02-28 08:04:44 Re: Speed dblink using alternate libpq tuple storage