Re: pg_basebackup: return value 1: reason?

From: Andrej Vanek <andrej(dot)vanek(dot)sk(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_basebackup: return value 1: reason?
Date: 2016-04-16 21:24:00
Message-ID: CAFNFRyHhkqug34cG3dpZwSdb+oZs==bM3f1dM1CkRG_D=w9Xzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

my setup is:
1 master
1 synchronous slave (up and running)
2 asynchronous slave (up and running)
Now I'm setting up asynchronous slave: first step of this setup is
pg_basebackup connecting to master.

Indirections are because it is encapsulated in a script. This script is
aimed to serve for automated replication recovery. This script is launched
by crm_mon -d daemon.

You are right to check without indirections: now I tried to run the command
directly from command line without variables- it works fine...

This means: no problem in pg_basebackup itself

but some problem in environment of process launched from crm_mon daemon.
Thanks for your hint (remove indirections). Pg_basebackup works fine when
launched from command-line..
Unfortenutely I have still no clue how to solve the issue- su works. I have
to find out why pg_basebackup cannot fork when launched from crm_mon.

Best Regards, Andrej

2016-04-16 1:17 GMT+02:00 Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>:

> On 04/15/2016 03:28 PM, Andrej Vanek wrote:
>
>> Hello,
>>
>> I tried to run pg_basebackup. Return value is 1.
>>
>> How to find out its reason?
>> (I suspect that some wal after backup is missing- but how to find
>> out the real reason? How to fix it?)
>>
>
> First it is not clear to me where you are taking the backup from, the
> master or the standby?
>
> Second there is a lot of redirection going on. What happens if you run the
> pg_basebackup directly (without doing su - postgres ...) and use hardcoded
> values instead of shell variables?
>
>
>
>> thanks, Andrej
>> --------------details:
>> environment: CentOS 6.7, postgres 9.5.1
>> ( PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
>> 20120313 (Red Hat 4.4.7-16), 64-bit)
>>
>> I tried 2 forms of pg_basebackup (-X fetch and -X stream). Both were
>> issued from a script:
>> # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP}
>> -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X fetch" 2>${LOG_FILE}.stderr
>> >> ${LOG_FILE}
>> # echo $?
>> 1 <--------------pg_basebackup failed!
>> # cat log.stderr
>> # cat /var/log/cluster/geo_repair.log.err
>> transaction log start point: 0/E3000028 on timeline 1
>> WARNING: skipping special file "./pg_hba.conf"
>> WARNING: skipping special file "./pg_hba.conf.save"
>> transaction log end point: 0/E30000F8
>> pg_basebackup: base backup completed <------------------no
>> reason for pg_basebackup failure!
>> # cp /tmp/pg_hba.conf /tmp/postgresql.conf /pg_data/
>> # su - postgres -c "/usr/pgsql-9.5/bin/pg_ctl -D /pg_data/ start"
>> # tail /pg_data/pg_log/postgresql-Fri.log
>> `pg_xlog/0000000100000000000000E2' ->
>> `../backups/arc/0000000100000000000000E2'
>> 2016-04-15 23:15:10 CEST:pgreplic(at)[unknown]:[10667] WARNING: skipping
>> special file "./pg_hba.conf"
>> 2016-04-15 23:15:10 CEST:pgreplic(at)[unknown]:[10667] WARNING: skipping
>> special file "./pg_hba.conf.save" <---------------recorded in
>> pg_log on master node and copied by pg_basebackup (note time difference
>> between two servers)
>> 2016-04-15 23:15:02 CEST:@:[23321] LOG: database system was
>> interrupted; last known up at 2016-04-15 23:15:10 CEST
>> 2016-04-15 23:15:02 CEST:postgres(at)postgres:[23329] FATAL: the database
>> system is starting up
>> 2016-04-15 23:15:03 CEST:@:[23321] LOG: entering standby mode
>> 2016-04-15 23:15:03 CEST:@:[23321] LOG: database system was not
>> properly shut down; automatic recovery in progress <---------something
>> missing from pg_basebackup
>> 2016-04-15 23:15:03 CEST:@:[23321] LOG: redo starts at 0/E3000028
>> 2016-04-15 23:15:03 CEST:@:[23321] LOG: consistent recovery state
>> reached at 0/E4000000
>> 2016-04-15 23:15:03 CEST:@:[23295] LOG: database system is ready to
>> accept read only connections
>> 2016-04-15 23:15:03 CEST:@:[23356] LOG: started streaming WAL from
>> primary at 0/E4000000 on timeline 1
>> -------second trial
>> # su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP}
>> -D ${GEO_STDBY_DATA} -U pgreplic -P -v -X stream"
>> # echo $?
>> 1
>> # cat /var/log/cluster/geo_repair.log.err
>> transaction log start point: 0/E5000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> WARNING: skipping special file "./pg_hba.conf"
>> WARNING: skipping special file "./pg_hba.conf.save"
>> transaction log end point: 0/E50000F8
>> pg_basebackup: waiting for background process to finish streaming ...
>> pg_basebackup: could not wait for child process: No child processes
>> <----what does this mean? I think it failed to start process to
>> fetching wal logs created during backup: but neither on master node
>> neither on pg_basebackup output here is any information about reason..
>> (max_wal_senders on master is 10: I see no reason to fail).
>>
>> postgres logs:
>> `pg_xlog/0000000100000000000000E4' ->
>> `../backups/arc/0000000100000000000000E4'
>> 2016-04-15 23:35:09 CEST:pgreplic(at)[unknown]:[29035] WARNING: skipping
>> special file "./pg_hba.conf"
>> 2016-04-15 23:35:09 CEST:pgreplic(at)[unknown]:[29035] WARNING: skipping
>> special file "./pg_hba.conf.save"
>> 2016-04-15 23:35:01 CEST:@:[28926] LOG: database system was
>> interrupted; last known up at 2016-04-15 23:35:09 CEST
>> 2016-04-15 23:35:01 CEST:postgres(at)postgres:[28938] FATAL: the database
>> system is starting up
>> 2016-04-15 23:35:02 CEST:@:[28926] LOG: entering standby mode
>> 2016-04-15 23:35:02 CEST:@:[28926] LOG: database system was not
>> properly shut down; automatic recovery in progress <------------this
>> means something missing from pg_basebackup
>> 2016-04-15 23:35:02 CEST:@:[28926] LOG: redo starts at 0/E5000028
>> 2016-04-15 23:35:02 CEST:@:[28926] LOG: consistent recovery state
>> reached at 0/E6000000
>> 2016-04-15 23:35:02 CEST:@:[28904] LOG: database system is ready to
>> accept read only connections
>> 2016-04-15 23:35:02 CEST:@:[28989] LOG: started streaming WAL from
>> primary at 0/E6000000 on timeline 1
>>
>> postgres params on master node:
>> log_line_prefix = '%t:%u(at)%d:[%p] '
>> logging_collector = on
>> wal_buffers = 16MB
>> max_wal_size = 200MB
>> log_temp_files = 1MB
>> max_connections = 170
>> shared_buffers = 512MB
>> effective_cache_size = 1500MB
>> work_mem = 48MB
>> log_lock_waits = on
>> log_min_duration_statement = 10000
>> shared_preload_libraries = 'pg_stat_statements'
>> include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA
>> wal_level = hot_standby
>> archive_mode = on
>> max_wal_senders = 10
>> hot_standby = on
>> wal_keep_segments = 128
>> archive_command = '/opt/postgres/dbconf/archive_command.sh %p %f'
>> wal_receiver_status_interval = 2
>> max_standby_streaming_delay = -1
>> max_standby_archive_delay = -1
>> restart_after_crash = off
>> hot_standby_feedback = on
>>
>>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrej Vanek 2016-04-16 21:28:58 Re: pg_basebackup: return value 1: reason?
Previous Message Sergei Agalakov 2016-04-16 19:33:21 Enhancement request for pg_dump