Re: pg_basebackup: return value 1: reason?

From: Andrej Vanek <andrej(dot)vanek(dot)sk(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_basebackup: return value 1: reason?
Date: 2016-05-23 16:04:52
Message-ID: CAFNFRyEppbLWhYoGGGdvcctgL0OAjpZL1mVKSF5UisF=WCOFuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,
I've given a try once again.
Two variants used in my script (launched by crm_mon):
1. /usr/pgsql-9.5/bin/pg_basebackup -U pgreplic -h db-other-site -w -D
/opt/geo_stdby_data -c fast -vvv -X stream &>> /tmp/log
2. strace -o /tmp/pg_basebackup.log /usr/pgsql-9.5/bin/pg_basebackup -U
pgreplic -h db-other-site -w -D /opt/geo_stdby_data -c fast -vvv -X stream
&>> /tmp/log

Result:
variant 2. works fine with return code 0 (with strace)
variant 1. fails with error code 1 (without strace)

Any ideas?

Andrej
----------------------details
Output:
Variant 2:
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base
backup' FAST NOWAIT
-- Mon May 23 17:54:31 CEST 2016 [l1abrnch->l1abrnch:3122/27282:GEO]
--INFO-- l1abrnch->l1abrnch (GEO-STDBY-DB / stop: 0): target/returned 0/0
(OK)
transaction log start point: 0/FA000028 on timeline 1
pg_basebackup: starting background WAL receiver
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION 0/FA000000 TIMELINE
1
WARNING: skipping special file "./pg_hba.conf"
DEBUG: standby "pg_basebackup" has now caught up with primary
DEBUG: write 0/FA000000 flush 0/0 apply 0/0
DEBUG: removing transaction log backup history file
"0000000100000000000000F8.00000028.backup"
transaction log end point: 0/FA0000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
RETVAL=0

Output
Variant 1:
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base
backup' FAST NOWAIT
-- Mon May 23 17:55:32 CEST 2016 [l1abrnch->l1abrnch:3122/28785:GEO]
--INFO-- l1abrnch->l1abrnch (GEO-STDBY-DB / stop: 0): target/returned 0/0
(OK)
transaction log start point: 0/FC000028 on timeline 1
pg_basebackup: starting background WAL receiver
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION 0/FC000000 TIMELINE
1
WARNING: skipping special file "./pg_hba.conf"
DEBUG: standby "pg_basebackup" has now caught up with primary
DEBUG: write 0/FC000000 flush 0/0 apply 0/0
DEBUG: removing transaction log backup history file
"0000000100000000000000FA.00000028.backup"
transaction log end point: 0/FC0000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: could not wait for child process: No child processes
RETVAL=1

2016-04-18 16:12 GMT+02:00 Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>:

> On 04/17/2016 12:13 PM, Andrej Vanek wrote:
>
>> Hello Adrian,
>>
>> I tried to use -U without "su"- launched directly by root: same behaviour.
>> Finally I reverted my script to use standard backup (pg_start_backup;
>> rsync; pg_stop_backup)- this works- the only downside is possible
>> collisions with on-line backup/synchronizaiton of other two nodes on
>> master node...
>>
>> Back to the pg_basebackup issue: it is clear to me that this is an issue
>> of environment which launched pg_basebackup.
>> Possibly either some privileges or some kernel parameters/limits. Who
>> knows?
>> Summary: clusterlab's crm_mon launched a shell script starting
>> pg_basebackup which fails to do some its work (pg_basebackup: could not
>> wait for child process: No child processes)- probably due to some
>> failing system call.
>>
>> How can I report to clusterlabs: What system call fails in pg_basebackup?
>>
>
> All I can to do is point you at:
>
>
> https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
>
>
>> Best Regards, Andrej
>>
>>
>>
>> 2016-04-17 1:09 GMT+02:00 Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
>> <mailto:adrian(dot)klaver(at)aklaver(dot)com>>:
>>
>>
>> Is the su - even necessary?
>>
>> pg_basebackup is a Postgres client program you can specify the user
>> you want it to connect to using -U.
>>
>> Or do you need the script to run as postgres in order to get
>> permissions on wherever you are creating the backup directory?
>>
>> have to find out why pg_basebackup cannot fork when launched
>> from crm_mon.
>>
>>
>>
>> I assume crm_mon is this:
>>
>> http://linux.die.net/man/8/crm_mon
>>
>> from Pacemaker.
>>
>> I do not use Pacemaker, but I am pretty sure that running what is a
>> monitoring program in daemon mode and then shelling out to another
>> program is not workable. The docs seem to bear this out:
>>
>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#Installation
>>
>>
>> https://github.com/smbambling/pgsql_ha_cluster/wiki/Building-A-Highly-Available-Multi-Node-PostgreSQL-Cluster
>>
>>
>>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-05-23 16:55:43 Re: Fatal error "stack empty" on ROLLBACK
Previous Message Bruno Wolff III 2016-05-23 14:27:29 Re: Postgresql-fdw