From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WAL logging problem in 9.4.3? |
Date: | 2015-07-02 22:21:02 |
Message-ID: | 20150702222102.GG30708@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2015-07-03 00:05:24 +0200, Martijn van Oosterhout wrote:
> === Start with an empty database
My guess is you have wal_level = minimal?
> ctmp=# begin;
> BEGIN
> ctmp=# create table test(id serial primary key);
> CREATE TABLE
> ctmp=# truncate table test;
> TRUNCATE TABLE
> ctmp=# commit;
> COMMIT
> ctmp=# select relname, relfilenode from pg_class where relname like 'test%';
> relname | relfilenode
> -------------+-------------
> test | 16389
> test_id_seq | 16387
> test_pkey | 16393
> (3 rows)
>
> === Note the index file is 8KB.
> === At this point nuke the database server (in this case it was simply
> === destroying the container it was running in.
How did you continue from there? The container has persistent storage?
Or are you repapplying the WAL to somewhere else?
> === Dump the xlogs just to show what got recorded. Note there's a
> === truncate for the data file and the index file.
That should be ok.
> martijn(at)martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /data/postgres/pg_xlog/ 000000010000000000000001 |grep -wE '16389|16387|16393'
> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/016A9240, prev 0/016A9200, bkp: 0000, desc: checkpoint: redo 0/16A9240; tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; oldest xid 673 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
> rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc: file create: base/16385/16387
> rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc: log: rel 1663/16385/16387
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc: file create: base/16385/16389
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc: file create: base/16385/16393
> rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc: log: rel 1663/16385/16387
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc: file truncate: base/16385/16389 to 0 blocks
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc: file truncate: base/16385/16393 to 0 blocks
> pg_xlogdump: FATAL: error in WAL record at 0/16BE710: record with zero length at 0/16BE740
Note that the truncate will lead to a new, different, relfilenode.
> === Start the DB up again
>
> database_1 | LOG: database system was interrupted; last known up at 2015-07-02 21:08:05 UTC
> database_1 | LOG: database system was not properly shut down; automatic recovery in progress
> database_1 | LOG: redo starts at 0/16A92A8
> database_1 | LOG: record with zero length at 0/16BE740
> database_1 | LOG: redo done at 0/16BE710
> database_1 | LOG: last completed transaction was at log time 2015-07-02 21:34:45.664989+00
> database_1 | LOG: database system is ready to accept connections
> database_1 | LOG: autovacuum launcher started
>
> === Oops, the index file is empty now
That's probably just the old index file?
> martijn(at)martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393}
> -rw------- 1 messagebus ssl-cert 8192 Jul 2 23:37 /data/postgres/base/16385/16387
> -rw------- 1 messagebus ssl-cert 0 Jul 2 23:34 /data/postgres/base/16385/16389
> -rw------- 1 messagebus ssl-cert 0 Jul 2 23:37 /data/postgres/base/16385/16393
>
> martijn(at)martijn-jessie:$ psql ctmp -h localhost -U username
> Password for user username:
> psql (9.4.3)
> Type "help" for help.
>
> === And now the index is broken. I think the only reason it doesn't
> === complain about the data file is because zero bytes there is OK. But if
> === the table had data before it would be gone now.
>
> ctmp=# select * from test;
> ERROR: could not read block 0 in file "base/16385/16393": read only 0 of 8192 bytes
Hm. I can't reproduce this. Can you include a bit more details about how
to reproduce?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-07-02 23:07:45 | Re: bugfix: incomplete implementation of errhidecontext |
Previous Message | Jim Nasby | 2015-07-02 22:20:16 | Determine operator from it's function |