BUG #13876: pg_xlogdump give an error on timeline switch

From: postgresql(dot)org(at)gclough(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #13876: pg_xlogdump give an error on timeline switch
Date: 2016-01-19 15:34:48
Message-ID: 20160119153448.2961.55971@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 13876
Logged by: Greg Clough
Email address: postgresql(dot)org(at)gclough(dot)com
PostgreSQL version: 9.3.10
Operating system: CentOS v7.2
Description:

When pg_xlogdump encounters a timeline switch in WAL, it marks it as an
"error in WAL record". I'm just guessing that there is an end-of-timeline
marker in the WAL that pg_xlogdump hasn't been programmed to recognize, so
it marks it as an error.

Here is a dump of WAL file that's had a timeline switch as the first thing
after forcing an pg_swtich_xlog():

-bash-4.2$ pwd
/opt/postgres/9.3/db2/pg_xlog

-bash-4.2$ ls -al
total 114704
drwx------. 3 postgres postgres 4096 Jan 19 09:15 .
drwx------. 16 postgres postgres 4096 Jan 19 09:33 ..
-rw-------. 1 postgres postgres 16777216 Jan 19 08:50
000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Jan 19 08:53
000000010000000000000003
-rw-------. 1 postgres postgres 16777216 Jan 19 09:07
000000020000000000000003
-rw-------. 1 postgres postgres 16777216 Jan 19 09:07
000000020000000000000004
-rw-------. 1 postgres postgres 16777216 Jan 19 09:10
000000020000000000000005
-rw-------. 1 postgres postgres 16777216 Jan 19 09:15
000000020000000000000006
-rw-------. 1 postgres postgres 41 Jan 19 08:55 00000002.history
-rw-------. 1 postgres postgres 16777216 Jan 19 09:33
000000030000000000000006
-rw-------. 1 postgres postgres 83 Jan 19 09:15 00000003.history
drwx------. 2 postgres postgres 78 Jan 19 09:15 archive_status

-bash-4.2$ pg_xlogdump 000000020000000000000006
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
0/06000028, prev 0/056370D0, bkp: 0000, desc: checkpoint: redo 0/6000028;
tli 2; prev tli 2; fpw true; xid 0/1894; oid 24576; multi 1; offset 0;
oldest xid 1879 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
shutdown
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
0/06000090, prev 0/06000028, bkp: 0000, desc: checkpoint: redo 0/6000090;
tli 2; prev tli 2; fpw true; xid 0/1894; oid 24576; multi 1; offset 0;
oldest xid 1879 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
shutdown
pg_xlogdump: FATAL: error in WAL record at 0/6000090: record with zero
length at 0/60000F8

That final message was mis-interpreted as corruption in the WAL, but without
knowing the internals of either the WAL or pg_xlogdump I can't say for sure
that it's not corruption. It's repeatable, so I doubt it's a problem in
PostgreSQL core.

Could someone confirm my understanding, and if so, then could we please get
pg_xlogdump updated to recognize an end-of-timeline?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2016-01-19 15:44:33 Re: BUG #13876: pg_xlogdump give an error on timeline switch
Previous Message Smith, Travis 2016-01-19 15:02:00 Re: BUG #13875: Error explaining query