Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir

From: Meera Nair <mnair(at)commvault(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Cc: Punit Pranesh Koujalgi <pkoujalgi(at)commvault(dot)com>, B Ganesh Kishan <bkishan(at)commvault(dot)com>
Subject: Logical replication type- WAL recovery fails and changes the size of wal segment in archivedir
Date: 2024-06-05 06:36:35
Message-ID: SJ1PR19MB6162FA7FE28CBA7C952147FBBAF92@SJ1PR19MB6162.namprd19.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi team,

With wal_level = 'logical', backup was taken using non-exclusive backup method.
Following procedure here for restore and recovery - PostgreSQL: Documentation: 16: 26.3. Continuous Archiving and Point-in-Time Recovery (PITR)<https://www.postgresql.org/docs/16/continuous-archiving.html#BACKUP-PITR-RECOVERY>

While starting the PostgreSQL server, below issue is seen:

2024-06-05 11:41:32.369 IST [54369] LOG: restored log file "00000005000000010000006A" from archive
2024-06-05 11:41:33.112 IST [54369] LOG: restored log file "00000005000000010000006B" from archive
cp: cannot stat '/home/pgsql/wmaster/00000005000000010000006C': No such file or directory
2024-06-05 11:41:33.167 IST [54369] LOG: redo done at 1/6B000100
2024-06-05 11:41:33.172 IST [54369] FATAL: archive file "00000005000000010000006B" has wrong size: 0 instead of 16777216
2024-06-05 11:41:33.173 IST [54367] LOG: startup process (PID 54369) exited with exit code 1
2024-06-05 11:41:33.173 IST [54367] LOG: terminating any other active server processes
2024-06-05 11:41:33.174 IST [54375] FATAL: archive command was terminated by signal 3: Quit
2024-06-05 11:41:33.174 IST [54375] DETAIL: The failed archive command was: cp pg_wal/00000005000000010000006B /home/pgsql/wmaster/00000005000000010000006B
2024-06-05 11:41:33.175 IST [54367] LOG: archiver process (PID 54375) exited with exit code 1
2024-06-05 11:41:33.177 IST [54367] LOG: database system is shut down

Here '/home/pgsql/wmaster' is my archivedir (the folder where WAL segments are restored from)

Before attempting start, size of 00000005000000010000006B file was 16 MB.
After failing to detect 00000005000000010000006C, there is a FATAL error saying wrong size for 00000005000000010000006B
Now the size of 00000005000000010000006B is observed as 2 MB. Size of all other WAL segments remain 16 MB.

-rw------- 1 postgres postgres 2359296 Jun 5 11:34 00000005000000010000006B

Why is it changing the size of WAL segment in archive log directory?
All necessary WAL segments are present and 00000005000000010000006C was never archived.

bash-4.2$ cat /home/pgsql/dmaster/backup_label.old
START WAL LOCATION: 1/69000028 (file 000000050000000100000069)
CHECKPOINT LOCATION: 1/69000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2024-05-31 17:39:43 IST
LABEL: pgida_backup_4321_315606_1717157383
START TIMELINE: 5

bash-4.2$ cat /home/pgsql/wmaster/00000005000000010000006B.00000028.backup
START WAL LOCATION: 1/6B000028 (file 00000005000000010000006B)
STOP WAL LOCATION: 1/6B000100 (file 00000005000000010000006B)
CHECKPOINT LOCATION: 1/6B000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2024-05-31 17:40:28 IST
LABEL: pgida_backup_4321_315606_1717157427
START TIMELINE: 5
STOP TIME: 2024-05-31 17:40:28 IST
STOP TIMELINE: 5

bash-4.2$ cat /home/pgsql/wmaster/00000005.history
1 0/3E000000 before 2000-01-01 05:30:00+05:30
2 0/63000000 before 2000-01-01 05:30:00+05:30
3 0/E8000000 no recovery target specified
4 1/68000000 before 2000-01-01 05:30:00+05:30

Despite our efforts to troubleshoot, the problem persists. Please help to find a solution.

Regards,
Meera

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Simon Elbaz 2024-06-05 07:08:56 Re: Long running query causing XID limit breach
Previous Message sud 2024-06-05 06:25:32 Re: Long running query causing XID limit breach