Re: PgbackRest PointTIme Recovery : server unable to start back

From: KK CHN <kkchn(dot)in(at)gmail(dot)com>
To: Muhammad Ikram <mmikram(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: PgbackRest PointTIme Recovery : server unable to start back
Date: 2024-07-26 06:33:49
Message-ID: CAKgGyB_=dmmEMC=Jrad00Ui2kbBB1+cs3dGMaWdQT6qMqPMDJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

here the log outputs....

When I ran

$sudo -u enterprisedb pgbackrest --stanza=Demo_Repo --delta
--set=20240719-122703F_20240724-094727I --target-timeline=current
--type=time "--target=2024-07-24 09:44:01.3255+05:30"
--target-action=promote restore

The Edb server log out put shows
...........................................................................
................................................................
2024-07-23 15:34:12 IST ERROR: syntax error at or near "selecct" at
character 1
2024-07-23 15:34:12 IST STATEMENT: selecct current_timestamp
2024-07-23 15:34:26 IST FATAL: database "enterprisedb" does not exist
2024-07-24 09:38:20 IST FATAL: database "enterprisedb" does not exist
2024-07-24 09:42:17 IST FATAL: database "test1" does not exist
2024-07-24 09:43:56 IST FATAL: database "enterprisedb" does not exist
2024-07-24 09:45:07 IST FATAL: database "enterprisedb" does not exist
2024-07-24 09:45:16 IST ERROR: relation "important_table" does not exist
at character 58
2024-07-24 09:45:16 IST STATEMENT: begin; drop table important_table;
commit; select * from important_table;
2024-07-24 09:46:00 IST LOG: checkpoint starting: time
2024-07-24 09:46:02 IST LOG: checkpoint complete: wrote 16 buffers (0.1%);
0 WAL file(s) added, 0 removed, 0 recycled; write=1.506 s, sync=0.001 s,
total=1.509 s; sync files=9, longest=0.001 s, average=0.001 s; distance=93
kB, estimate=171 kB; lsn=0/3C045768, redo lsn=0/3C045730
2024-07-24 09:47:27 IST LOG: checkpoint starting: immediate force wait
2024-07-24 09:47:27 IST LOG: checkpoint complete: wrote 2 buffers (0.0%);
0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s,
total=0.003 s; sync files=1, longest=0.001 s, average=0.001 s;
distance=16106 kB, estimate=16106 kB; lsn=0/3D000060, redo lsn=0/3D000028
2024-07-24 09:47:27.788 P00 INFO: archive-push command begin 2.52.1:
[pg_wal/00000009000000000000003C] --exec-id=34036-c1d6de0b
--log-level-console=info --log-level-file=debug
--pg1-path=/var/lib/edb/as16/data --pg-version-force=16
--repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-24 09:47:28.130 P00 INFO: pushed WAL file
'00000009000000000000003C' to the archive
2024-07-24 09:47:28.230 P00 INFO: archive-push command end: completed
successfully (444ms)

When I ran $ sudo systemctl start edb-as-16.service

4-07-26 11:32:56 IST LOG: starting PostgreSQL 16.3 (EnterpriseDB Advanced
Server 16.3.0) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1
20231218 (Red Hat 11.4.1-3), 64-bit
2024-07-26 11:32:56 IST LOG: listening on IPv4 address "0.0.0.0", port 5444
2024-07-26 11:32:56 IST LOG: listening on IPv6 address "::", port 5444
2024-07-26 11:32:56 IST LOG: listening on Unix socket "/tmp/.s.PGSQL.5444"
2024-07-26 11:32:56 IST LOG: database system was interrupted; last known
up at 2024-07-24 09:47:27 IST
2024-07-26 11:32:56 IST LOG: starting point-in-time recovery to 2024-07-24
09:44:01.3255+05:30
2024-07-26 11:32:56 IST LOG: starting backup recovery with redo LSN
0/3D000028, checkpoint LSN 0/3D000060, on timeline ID 9
2024-07-26 11:32:56.475 P00 INFO: archive-get command begin 2.52.1:
[00000009.history, pg_wal/RECOVERYHISTORY] --exec-id=43293-1d583a8e
--log-level-console=info --log-level-file=debug
--pg1-path=/var/lib/edb/as16/data --pg-version-force=16
--repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:56.709 P00 INFO: found 00000009.history in the repo1:
16-1 archive
2024-07-26 11:32:56.809 P00 INFO: archive-get command end: completed
successfully (335ms)
2024-07-26 11:32:56 IST LOG: restored log file "00000009.history" from
archive
2024-07-26 11:32:56.816 P00 INFO: archive-get command begin 2.52.1:
[00000009000000000000003D, pg_wal/RECOVERYXLOG] --exec-id=43295-55c5d6f7
--log-level-console=info --log-level-file=debug
--pg1-path=/var/lib/edb/as16/data --pg-version-force=16
--repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:56.999 P00 INFO: found 00000009000000000000003D in the
repo1: 16-1 archive
2024-07-26 11:32:57.099 P00 INFO: archive-get command end: completed
successfully (285ms)
2024-07-26 11:32:57 IST LOG: restored log file "00000009000000000000003D"
from archive
2024-07-26 11:32:57 IST LOG: redo starts at 0/3D000028
2024-07-26 11:32:57.114 P00 INFO: archive-get command begin 2.52.1:
[00000009000000000000003E, pg_wal/RECOVERYXLOG] --exec-id=43297-34867c05
--log-level-console=info --log-level-file=debug
--pg1-path=/var/lib/edb/as16/data --pg-version-force=16
--repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:57.282 P00 INFO: found 00000009000000000000003E in the
repo1: 16-1 archive
2024-07-26 11:32:57.382 P00 INFO: archive-get command end: completed
successfully (270ms)
2024-07-26 11:32:57 IST LOG: restored log file "00000009000000000000003E"
from archive
2024-07-26 11:32:57.400 P00 INFO: archive-get command begin 2.52.1:
[00000009000000000000003F, pg_wal/RECOVERYXLOG] --exec-id=43299-e2db2e1b
--log-level-console=info --log-level-file=debug
--pg1-path=/var/lib/edb/as16/data --pg-version-force=16
--repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:57.521 P00 INFO: unable to find 00000009000000000000003F
in the archive
2024-07-26 11:32:57.621 P00 INFO: archive-get command end: completed
successfully (222ms)
2024-07-26 11:32:57 IST LOG: completed backup recovery with redo LSN
0/3D000028 and end LSN 0/3D000100
2024-07-26 11:32:57 IST LOG: consistent recovery state reached at
0/3D000100
2024-07-26 11:32:57 IST LOG: database system is ready to accept read-only
connections
2024-07-26 11:32:57.632 P00 INFO: archive-get command begin 2.52.1:
[00000009000000000000003F, pg_wal/RECOVERYXLOG] --exec-id=43301-f613dae9
--log-level-console=info --log-level-file=debug
--pg1-path=/var/lib/edb/as16/data --pg-version-force=16
--repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:57.761 P00 INFO: unable to find 00000009000000000000003F
in the archive
2024-07-26 11:32:57.861 P00 INFO: archive-get command end: completed
successfully (231ms)
2024-07-26 11:32:57 IST LOG: redo done at 0/3E000060 system usage: CPU:
user: 0.00 s, system: 0.00 s, elapsed: 0.75 s
2024-07-26 11:32:57 IST FATAL: recovery ended before configured recovery
target was reached
2024-07-26 11:32:57 IST LOG: startup process (PID 43292) exited with exit
code 1

ONLY inference I can make is

INFO unable to find 00000009000000000000003F in the archive ( This
means the EDB server (10.10.20.6 ) unable to push the archives to the
Repo server(10.10.20.7 ) ? Is that the reason for the recovery and
start backing of edb server fails ?

the pg_hba.conf entry in the EDB Server machine is as

host all all 127.0.0.1/32 ident
host all all 10.10.20.7/32 scram-sha-256
#host all all 10.10.20.7/32 trust
# IPv6 local connections:
host all all ::1/128 ident
#host all all 10.10.20.7/24 trust

# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all peer
host replication all 10.10.20.7/32
scram-sha-256
host replication all 127.0.0.1/32 ident
host replication all ::1/128 ident

Do I have to change anything in pg_hba.conf ?

my EDB Server conf as this

archive_mode = on
archive_command = 'pgbackrest --stanza=Demo_Repo archive-push %p'
log_filename = 'postgresql.log'
max_wal_senders = 5
wal_level = replica

Any help ?

Krishane

On Fri, Jul 26, 2024 at 10:45 AM Muhammad Ikram <mmikram(at)gmail(dot)com> wrote:

> Hi KK CHN
>
> Could you check server logs ?
> Your service trace suggests that it started server and then failure
> happened
>
> ul 26 09:48:49 service01 systemd[1]: Started EDB Postgres Advanced Server
> 16.
> Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Main process
> exited, code=exited, status=1/FAILURE
>
>
>
> Regards,
> Ikram
>
>
> On Fri, Jul 26, 2024 at 10:04 AM KK CHN <kkchn(dot)in(at)gmail(dot)com> wrote:
>
>> List,
>>
>> Reference: https://pgbackrest.org/user-guide-rhel.html#pitr
>> I am following the PTR on RHEL9 EPAS16.
>> I am able to do a backup(Full, diff and incr) and restore from a full
>> backup and restart of EPAS16 works fine.
>>
>> But when I do an incremental backup after doing the procedures
>> mentioned in the PTR section of the above reference link and try
>> restoring the EDB database from the INCR backup and then starting up the
>> EPAS16 the server always results in dead state
>>
>> My repo server is another machine. If I do a full restore on the DB
>> server ( sudo -u enterprisedb pgbackrest --stanza=Demo_Repo --delta
>> restore) it works and the server starts without any issue.
>> Restoring from Incremental backup tty output shows successful but edb
>> service start results in failure.
>>
>> Any help is much appreciated.
>>
>> Krishane.
>>
>>
>>
>>
>> STEPS followed:
>>
>> after dropping the table pg-primary *⇒* Drop the important table
>> ([section]
>> stopped the EDB server.
>>
>> $ sudo -u enterprisedb pgbackrest --stanza=Demo_Repo --delta
>> --set=20240719-122703F_20240724-094727I --target-timeline=current
>> --type=time "--target=2024-07-24 09:44:01.3255+05:30"
>> --target-action=promote restore
>> .....................................................
>>
>> 2024-07-26 09:48:06.343 P00 INFO: restore command end: completed
>> successfully (1035ms)
>>
>>
>> But
>>
>> [root(at)rservice01 ~]# sudo systemctl start edb-as-16.service
>> [root(at)service01 ~]# sudo systemctl status edb-as-16.service
>> × edb-as-16.service - EDB Postgres Advanced Server 16
>> Loaded: loaded (/etc/systemd/system/edb-as-16.service; disabled;
>> preset: disabled)
>> * Active: failed* (Result: exit-code) since Fri 2024-07-26 09:48:50
>> IST; 8s ago
>> Duration: 242ms
>> Process: 41903 ExecStartPre=/usr/edb/as16/bin/edb-as-16-check-db-dir
>> ${PGDATA} (code=exited, status=0/SUCCESS)
>> Process: 41908 ExecStart=/usr/edb/as16/bin/edb-postgres -D ${PGDATA}
>> (code=exited, status=1/FAILURE)
>> Main PID: 41908 (code=exited, status=1/FAILURE)
>> CPU: 331ms
>>
>> Jul 26 09:48:48 service01 systemd[1]: Starting EDB Postgres Advanced
>> Server 16...
>> Jul 26 09:48:48 service01 edb-postgres[41908]: 2024-07-26 09:48:48 IST
>> LOG: redirecting log output to logging collector process
>> Jul 26 09:48:48 service01 edb-postgres[41908]: 2024-07-26 09:48:48 IST
>> HINT: Future log output will appear in directory "log".
>> Jul 26 09:48:49 service01 systemd[1]: Started EDB Postgres Advanced
>> Server 16.
>> Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Main process
>> exited, code=exited, status=1/FAILURE
>> Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Killing process
>> 41909 (edb-postgres) with signal SIGKILL.
>> Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Failed with
>> result 'exit-code'.
>> [root(at)service01 ~]#
>>
>> why is it unable to perform a restore and recovery from an incr backup ?
>>
>>
>>
>>
>>
>>
>>
>> On The Repo Server
>> [root(at)service02 ~]# sudo -u postgres pgbackrest --stanza=Demo_Repo info
>> stanza: Demo_Repo
>> status: ok
>> cipher: aes-256-cbc
>>
>> db (current)
>> wal archive min/max (16):
>> 000000020000000000000021/0000000B0000000000000041
>>
>> full backup: 20240719-122703F
>> timestamp start/stop: 2024-07-19 12:27:03+05:30 / 2024-07-19
>> 12:27:06+05:30
>> wal start/stop: 00000002000000000000002A /
>> 00000002000000000000002A
>> database size: 61.7MB, database backup size: 61.7MB
>> repo1: backup size: 9.6MB
>>
>> incr backup: 20240719-122703F_20240719-123353I
>> timestamp start/stop: 2024-07-19 12:33:53+05:30 / 2024-07-19
>> 12:33:56+05:30
>> wal start/stop: 00000002000000000000002C /
>> 00000002000000000000002C
>> database size: 61.7MB, database backup size: 6.4MB
>> repo1: backup size: 6.2KB
>> backup reference list: 20240719-122703F
>>
>> diff backup: 20240719-122703F_20240719-123408D
>> timestamp start/stop: 2024-07-19 12:34:08+05:30 / 2024-07-19
>> 12:34:10+05:30
>> wal start/stop: 00000002000000000000002E /
>> 00000002000000000000002E
>> database size: 61.7MB, database backup size: 6.4MB
>> repo1: backup size: 6.4KB
>> backup reference list: 20240719-122703F
>>
>> incr backup: 20240719-122703F_20240723-110212I
>> timestamp start/stop: 2024-07-23 11:02:12+05:30 / 2024-07-23
>> 11:02:15+05:30
>> wal start/stop: 000000070000000000000038 /
>> 000000070000000000000038
>> database size: 48MB, database backup size: 6.4MB
>> repo1: backup size: 9.8KB
>> backup reference list: 20240719-122703F,
>> 20240719-122703F_20240719-123408D
>>
>> incr backup: 20240719-122703F_20240723-141818I
>> timestamp start/stop: 2024-07-23 14:18:18+05:30 / 2024-07-23
>> 14:18:22+05:30
>> wal start/stop: 00000008000000000000003C /
>> 00000008000000000000003C
>> database size: 75.4MB, database backup size: 33.8MB
>> repo1: backup size: 4.7MB
>> backup reference list: 20240719-122703F,
>> 20240719-122703F_20240719-123408D, 20240719-122703F_20240723-110212I
>>
>> incr backup: 20240719-122703F_20240724-094727I
>> timestamp start/stop: 2024-07-24 09:47:27+05:30 / 2024-07-24
>> 09:47:30+05:30
>> wal start/stop: 00000009000000000000003D /
>> 00000009000000000000003D
>> database size: 62MB, database backup size: 11.5MB
>> repo1: backup size: 72KB
>> backup reference list: 20240719-122703F,
>> 20240719-122703F_20240719-123408D, 20240719-122703F_20240723-110212I,
>> 20240719-122703F_20240723-141818
>>
>>
>>
>>
>
> --
> Muhammad Ikram
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Muhammad Ikram 2024-07-26 06:57:39 Re: PgbackRest PointTIme Recovery : server unable to start back
Previous Message Muhammad Ikram 2024-07-26 06:15:42 Re: Slow performance