BUG #14321: pg_basebackup --xlog-method=stream fails

From: juergen+postgresql(at)strobel(dot)info
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14321: pg_basebackup --xlog-method=stream fails
Date: 2016-09-09 16:58:46
Message-ID: 20160909165846.20024.16221@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14321
Logged by: Jürgen Strobel
Email address: juergen+postgresql(at)strobel(dot)info
PostgreSQL version: 9.5.4
Operating system: CentOS7
Description:

Hello everyone,

Quite often while running pg_basebackup --xlog-method=stream I get the
following warning:

pg_basebackup: could not receive data from WAL stream: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

The filsystem backup continues successfully to its end, but it concludes
without the necessary WAL files. I verified in pg_stat_replication that
pg_basebackup is not trying to reconnect to the master.

I am running this in a VM taking a backup of a live ~300-900GB DBs.
Sometimes IO spikes seem to cause hangs larger than the server's
wal_sender_timeout, which is the default 60s. The VM has much less resources
than the upstream DB. I don't really want to increase wal_sender_timeout
because there are other (non-backup) HA standbys too, and I wouldn't know to
how much.

I understand how to repair this manually and it's not an end-of-the-world
bug, but it would be nice if pg_basebackup would just reconnect the
streaming WAL connection in the same way as pg_receivexlog does. Especially
as that error happens in a long script run by cron and/or other people who
do not have this insight.

I haven't had time to try 9.6's --slot option yet, but I suspect this won't
be a full cure either unless it also changes the re-connect behavior.

Best regards,
Jürgen Strobel

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Xtra Coder 2016-09-09 19:11:54 Re: Performance issue: jsonb_object_agg() is twice slower than to_jsonb()
Previous Message Vitaly Burovoy 2016-09-09 15:53:44 Re: Performance issue: jsonb_object_agg() is twice slower than to_jsonb()