postgres complete recovery

From: Pavan Kumar <pavan(dot)dba27(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: postgres complete recovery
Date: 2019-04-22 21:07:00
Message-ID: CA+M0sHE9gFPk6AJBjxZXbi6cUYfBmo8mLKACcMxGzLEPnOhA0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Postgres complete recovery.

Hello experts,

i am working on postgres backup and recovery, with pg_basebackup utility.
for some reason i was not able to perform complete recovery, recovery is
completed till the available wal log not not able to
perform complete recovery committed transaction on wal logs.

any one performed complete database recovery ? if yes, kindly share the
procedure.

here is the test case i have done

wal log setting on postgres

=====>>> enable archive log mode for a postgres cluster

wal_level = replica # minimal, replica, or logical
archive_mode = off # enables archiving; off, on, or always
archive_command = 'cp %p /pgdata/archive/5444/%f' # command to
use to archive a logfile segment

======>>>> verify archive log setting

[postgres(at)pgsql-lnx1 datadir_clone]$ psql -p 5444 postgres

postgres=# show archive_mode;
archive_mode
--------------
on
(1 row)

postgres=# show archive_command
postgres-# ;
archive_command
-------------------------------
cp %p /pgdata/archive/5444/%f
(1 row)

postgres=# show wal_level;
wal_level
-----------
replica
(1 row)

===== >>> force manual swtich
postgres=# select pg_switch_wal();
pg_switch_wal
---------------
0/13A7EBC8
(1 row)

postgres=# \! ls -l /pgdata/archive/5444/
total 32768
-rw-------. 1 postgres postgres 16777216 Mar 4 04:53
000000010000000000000013
-rw-------. 1 postgres postgres 16777216 Mar 4 04:54
000000010000000000000014
postgres=#

to perform hot backup, with low level api

1. login to postgres server with psql
2. enable start backup (select pg_start_backup('hotbackup_20190304');
3. use os command to copy data, tablespace, wal location to backup media
4. stop backup (select pg_stop_backup('hotbackup_20190304');)

pg_basebackup tool .

makes binary copy of database cluster files
it will automatically put database in backup mode and ends once backup is
completed.
it uses stream protocol

below parameter has to be st for pg_basebackup

wal_level = replica # minimal, replica, or logical
archive_mode = off # enables archiving; off, on, or always
archive_command = 'cp %p /pgdata/archive/5444/%f' # command to
use to archive a logfile segment
#max_wal_senders = 10 # max number of walsender processes

modify hba_file.conf
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication postgres md5
host replication postgres 127.0.0.1/32 md5
host replication postgres ::1/128 md5
#Allow remote replication connections from localhost, by a user with the
host replication <username> <ip adderess>/32 md5

restart database
pg_ctl -D $PWD stop

run pg_Basebackup

[postgres(at)pgsql-lnx1 online]$ pg_basebackup -h localhost -p 5444 -D
/pgbkup/backup/fsbackup/online -X stream -v > /tmp/hotbackup.log 2>&1
Password:

note -X is used to copy latest wal segment (16mb) to self start datababse
after restore and recovery

view the file /tmp/hotbackup.log file to see the backup information.
[postgres(at)pgsql-lnx1 online]$ cat /tmp/hotbackup.log
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/19000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: write-ahead log end point: 0/19000130
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[postgres(at)pgsql-lnx1 online]$

[postgres(at)pgsql-lnx1 online]$ pwd
/pgbkup/backup/fsbackup/online
[postgres(at)pgsql-lnx1 online]$ cat backup_label
START WAL LOCATION: 0/19000028 (file 000000010000000000000019)
CHECKPOINT LOCATION: 0/19000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-04-04 05:44:46 CDT
LABEL: pg_basebackup base backup
[postgres(at)pgsql-lnx1 online]$

====== >>>>>perform recovery:

create table test and insert few rows

ltuser=> create table test (eno int);
CREATE TABLE

ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> select count(*) from test;
count
-------
10000
(1 row)

ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> select count(*) from test;
count
-------
20000
(1 row)

ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> commit;
WARNING: there is no transaction in progress
COMMIT
ltuser=> select count(*) from test;
count
-------
40000
(1 row)

ltuser=> \! ls -lart /pgdata/archive/5444/
total 114700
drwxrwxr-x. 4 postgres postgres 30 Mar 4 04:41 ..
-rw-------. 1 postgres postgres 16777216 Mar 4 04:53
000000010000000000000013
-rw-------. 1 postgres postgres 16777216 Mar 4 04:54
000000010000000000000014
-rw-------. 1 postgres postgres 16777216 Mar 4 05:39
000000010000000000000015
-rw-------. 1 postgres postgres 16777216 Mar 4 05:43
000000010000000000000016
-rw-------. 1 postgres postgres 305 Mar 4 05:43
000000010000000000000017.00000028.backup
-rw-------. 1 postgres postgres 16777216 Mar 4 05:43
000000010000000000000017
-rw-------. 1 postgres postgres 16777216 Mar 4 05:44
000000010000000000000018
-rw-------. 1 postgres postgres 305 Mar 4 05:44
000000010000000000000019.00000028.backup
drwxrwxr-x. 2 postgres postgres 4096 Mar 4 05:44 .
-rw-------. 1 postgres postgres 16777216 Mar 4 05:44
000000010000000000000019
ltuser=> \! ls -lart /pgdata/datadir_clone/pg_wal
total 213004
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001B
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001C
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001D
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001E
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001F
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000020
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000021
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000022
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000024
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000023
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000025
drwx------. 19 postgres postgres 4096 Mar 4 05:39 ..
-rw-------. 1 postgres postgres 16777216 Mar 4 05:44
000000010000000000000019
-rw-------. 1 postgres postgres 305 Mar 4 05:44
000000010000000000000019.00000028.backup
drwx------. 3 postgres postgres 4096 Mar 4 05:59 .
drwx------. 2 postgres postgres 96 Mar 4 05:59 archive_status
-rw-------. 1 postgres postgres 16777216 Mar 4 06:24
00000001000000000000001A
ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> insert into test values (generate_series(1,10000));
INSERT 0 10000
ltuser=> select count(*) from test;
count
-------
70000
(1 row)

ltuser=> \! ls -lart /pgdata/datadir_clone/pg_wal
total 213004
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001B
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001C
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001D
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001E
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
00000001000000000000001F
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000020
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000021
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000022
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000024
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000023
-rw-------. 1 postgres postgres 16777216 Mar 4 03:08
000000010000000000000025
drwx------. 19 postgres postgres 4096 Mar 4 05:39 ..
-rw-------. 1 postgres postgres 16777216 Mar 4 05:44
000000010000000000000019
-rw-------. 1 postgres postgres 305 Mar 4 05:44
000000010000000000000019.00000028.backup
drwx------. 3 postgres postgres 4096 Mar 4 05:59 .
drwx------. 2 postgres postgres 96 Mar 4 05:59 archive_status
-rw-------. 1 postgres postgres 16777216 Mar 4 06:25
00000001000000000000001A >>>>> file has latest table test table , log
swith not completed so far
ltuser=> \! ls -lart /pgdata/archive/5444/
total 114700
drwxrwxr-x. 4 postgres postgres 30 Mar 4 04:41 ..
-rw-------. 1 postgres postgres 16777216 Mar 4 04:53
000000010000000000000013
-rw-------. 1 postgres postgres 16777216 Mar 4 04:54
000000010000000000000014
-rw-------. 1 postgres postgres 16777216 Mar 4 05:39
000000010000000000000015
-rw-------. 1 postgres postgres 16777216 Mar 4 05:43
000000010000000000000016
-rw-------. 1 postgres postgres 305 Mar 4 05:43
000000010000000000000017.00000028.backup
-rw-------. 1 postgres postgres 16777216 Mar 4 05:43
000000010000000000000017
-rw-------. 1 postgres postgres 16777216 Mar 4 05:44
000000010000000000000018
-rw-------. 1 postgres postgres 305 Mar 4 05:44
000000010000000000000019.00000028.backup
drwxrwxr-x. 2 postgres postgres 4096 Mar 4 05:44 .
-rw-------. 1 postgres postgres 16777216 Mar 4 05:44
000000010000000000000019
ltuser=> \q
[postgres(at)pgsql-lnx1 pg_wal]$ pwd

==== >> so far 70000 rows are inserted in test table on ltuser database

our backup was completed on
START WAL LOCATION: 0/19000028 (file 000000010000000000000019)
CHECKPOINT LOCATION: 0/19000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-04-04 05:44:46 CDT >>>>>>
LABEL: pg_basebackup base backup

===== >>> lets simulate crash

[postgres(at)pgsql-lnx1 pg_wal]$ ps -ef |grep post
root 1256 1 0 Mar03 ? 00:00:00 /usr/libexec/postfix/master
-w
postfix 1284 1256 0 Mar03 ? 00:00:00 qmgr -l -t unix -u
postgres 4104 1 0 Mar03 ? 00:00:02
/opt/postgres/pghome/10.6/bin/postgres -D /pgdata/datadir
postgres 4106 4104 0 Mar03 ? 00:00:03 postgres: checkpointer
process
postgres 4107 4104 0 Mar03 ? 00:00:01 postgres: writer process
postgres 4108 4104 0 Mar03 ? 00:00:03 postgres: wal writer process
postgres 4109 4104 0 Mar03 ? 00:00:04 postgres: autovacuum
launcher process
postgres 4110 4104 0 Mar03 ? 00:00:08 postgres: stats collector
process
postgres 4111 4104 0 Mar03 ? 00:00:00 postgres: bgworker: logical
replication launcher
root 12813 12760 0 04:37 pts/3 00:00:00 su - postgres
postgres 12814 12813 0 04:37 pts/3 00:00:00 -bash
postfix 14203 1256 0 05:26 ? 00:00:00 pickup -l -t unix -u
postgres 14546 1 0 05:39 pts/3 00:00:00
/opt/postgres/pghome/10.6/bin/postgres -D /pgdata/datadir_clone
postgres 14548 14546 0 05:39 ? 00:00:00 postgres: checkpointer
process
postgres 14549 14546 0 05:39 ? 00:00:00 postgres: writer process
postgres 14550 14546 0 05:39 ? 00:00:00 postgres: wal writer process
postgres 14551 14546 0 05:39 ? 00:00:00 postgres: autovacuum
launcher process
postgres 14552 14546 0 05:39 ? 00:00:00 postgres: archiver process
last was 000000010000000000000019
postgres 14553 14546 0 05:39 ? 00:00:00 postgres: stats collector
process
postgres 14554 14546 0 05:39 ? 00:00:00 postgres: bgworker: logical
replication launcher
root 15199 15135 0 06:00 pts/2 00:00:00 su - postgres
postgres 15200 15199 0 06:00 pts/2 00:00:00 -bash
postgres 15892 15200 0 06:26 pts/2 00:00:00 ps -ef
postgres 15893 15200 0 06:26 pts/2 00:00:00 grep --color=auto post
[postgres(at)pgsql-lnx1 pg_wal]$ kill -9 14546
===== >>>> remove files under data directory

[postgres(at)pgsql-lnx1 datadir_clone]$ rm -rf *
[postgres(at)pgsql-lnx1 datadir_clone]$ ls -lart
total 4
drwxr-xr-x. 9 postgres postgres 4096 Mar 4 04:41 ..
drwx------. 2 postgres postgres 6 Mar 4 06:28 .

===== >>>> try to start database

[postgres(at)pgsql-lnx1 datadir_clone]$ pg_ctl -D $PWD start
pg_ctl: directory "/pgdata/datadir_clone" is not a database cluster
directory
[postgres(at)pgsql-lnx1 datadir_clone]$ pwd
/pgdata/datadir_clone
[postgres(at)pgsql-lnx1 datadir_clone]$ psql -p 5444
psql: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5444"?

===== >>> let's restore from backup

[postgres(at)pgsql-lnx1 datadir_clone]$ cd /pgbkup/backup/fsbackup/online/
[postgres(at)pgsql-lnx1 online]$ ls
backup_label global pg_commit_ts pg_hba.conf pg_logical
pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal
postgresql.auto.conf postgresql.conf.orig
base logfile pg_dynshmem pg_ident.conf pg_multixact
pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
postgresql.conf
[postgres(at)pgsql-lnx1 online]$ cp -rp * /pgdata/datadir_clone
==== >>> create recovery command file under data directory
cat recovery.conf
restore_command = 'cp /pgdata/archive/5444/%f %p'
==== >>> start recovery

[postgres(at)pgsql-lnx1 datadir_clone]$ pg_ctl -D $PWD start
waiting for server to start....2019-04-04 06:30:40.174 CDT [16043] LOG:
listening on IPv6 address "::1", port 5444
2019-04-04 06:30:40.174 CDT [16043] LOG: listening on IPv4 address
"127.0.0.1", port 5444
2019-04-04 06:30:40.177 CDT [16043] LOG: listening on Unix socket
"/tmp/.s.PGSQL.5444"
2019-04-04 06:30:40.202 CDT [16044] LOG: database system was interrupted;
last known up at 2019-04-04 05:44:46 CDT
2019-04-04 06:30:40.490 CDT [16044] LOG: starting archive recovery >>>>>>
******
2019-04-04 06:30:40.555 CDT [16044] LOG: restored log file
"000000010000000000000019" from archive
2019-04-04 06:30:40.609 CDT [16044] LOG: redo starts at 0/19000028
2019-04-04 06:30:40.612 CDT [16044] LOG: consistent recovery state reached
at 0/19000130
2019-04-04 06:30:40.613 CDT [16043] LOG: database system is ready to
accept read only connections
cp: cannot stat ‘/pgdata/archive/5444/00000001000000000000001A’: No such
file or directory
2019-04-04 06:30:40.621 CDT [16044] LOG: redo done at 0/19000130
2019-04-04 06:30:40.662 CDT [16044] LOG: restored log file
"000000010000000000000019" from archive >>>.****
done
server started >>>> *****
[postgres(at)pgsql-lnx1 datadir_clone]$ cp: cannot stat
‘/pgdata/archive/5444/00000002.history’: No such file or directory >>>***
error
2019-04-04 06:30:40.721 CDT [16044] LOG: selected new timeline ID: 2
2019-04-04 06:30:40.792 CDT [16044] LOG: archive recovery complete
cp: cannot stat ‘/pgdata/archive/5444/00000001.history’: No such file or
directory
2019-04-04 06:30:40.906 CDT [16043] LOG: database system is ready to
accept connections >>>> *** opened database
===== >>> let's validate

[postgres(at)pgsql-lnx1 datadir_clone]$ psql -p 5444
psql (10.6)
Type "help" for help.

postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
-----------+-----------+----------+-------------+-------------+------------------------------
fal1dbd | fal1admin | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
fal1admin=CTc/fal1admin +
| | | | | =T/fal1admin
+
| | | | |
visa_system_role=c/fal1admin
fal2dbd | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres=CTc/postgres +
| | | | | =T/postgres
+
| | | | |
visa_system_role=c/postgres
fal3dbd | fal3admin | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
fal3admin=CTc/fal3admin +
| | | | | =T/fal3admin
ltuser | ltuser | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres=CTc/postgres +
| | | | | =T/postgres
+
| | | | |
visa_system_role=c/postgres
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
+
| | | | |
postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres=CTc/postgres +
| | | | | =c/postgres
(7 rows)

postgres=# \c ltuser ltuser
You are now connected to database "ltuser" as user "ltuser".
ltuser=> \dt
List of relations
Schema | Name | Type | Owner
--------+------------+-------+--------
public | customer | table | ltuser
public | district | table | ltuser
public | history | table | ltuser
public | item | table | ltuser
public | new_order | table | ltuser
public | order_line | table | ltuser
public | orders | table | ltuser
public | stock | table | ltuser
public | warehouse | table | ltuser
(9 rows)

ltuser=>
ltuser=> exit

===== >>>> validation results

since we lost every thing, it only performed recovery till last available
log in hotbackup.
since we lost every thing, it only performed recovery till last available
log in hotbackup.
latest data in current wal log is lost and there we have test table.
lost of latest wal file/log will give some data loss, need to find a way to
have redundant copy on latest wall logfile

--

*Regards,#! Pavan Kumar----------------------------------------------*-
*Sr. Database Administrator..!*
*NEXT GENERATION PROFESSIONALS, LLC*
*Cell # 267-799-3182 # pavan.dba27 (Gtalk) *
*India # 9000459083*

*Take Risks; if you win, you will be very happy. If you lose you will be
Wise *

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message David G. Johnston 2019-04-22 21:20:24 Re: postgres complete recovery
Previous Message pavan95 2019-04-22 13:17:56 Re: Too many serialization errors in production