TOAST table repeatedly corrupted

From: Niles Oien <noien(at)nso(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: TOAST table repeatedly corrupted
Date: 2018-05-09 18:54:17
Message-ID: CANQ3m6Nhq8VqtZ8SnC3ZLVdQf_1Upb1R+hsPpbFsS7fJOw+Jqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I have what I think is a bug in postgres. I'd like to help track it down,
if that
is indeed what's going on.

I am running a reasonably recent version of postgres :

data=# select version();
version

----------------------------------------------------------------------------------------------------------
PostgreSQL 9.5.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-17), 64-bit
(1 row)

About every six months, the logs show that vacuuming of
one of our tables (the same table every time) starts to fail :

grep ERROR postgresql-09.csv

2018-05-09 16:14:03.834 GMT,,,27018,,5af31e4b.698a,1,,2018-05-09 16:14:03
GMT,12/611211,0,ERROR,XX001,"invalid page in block 1374551 of relation
base/16384/36298640",,,,,"automatic vacuum of table
""data.pg_toast.pg_toast_36298637""",,,,""
2018-05-09 16:15:03.865 GMT,,,27696,,5af31e87.6c30,1,,2018-05-09 16:15:03
GMT,12/611219,0,ERROR,XX001,"invalid page in block 1374551 of relation
base/16384/36298640",,,,,"automatic vacuum of table
""data.pg_toast.pg_toast_36298637""",,,,""
2018-05-09 16:16:03.897 GMT,,,31666,,5af31ec3.7bb2,1,,2018-05-09 16:16:03
GMT,10/1330957,0,ERROR,XX001,"invalid page in block 1374551 of relation
base/16384/36298640",,,,,"automatic vacuum of table
""data.pg_toast.pg_toast_36298637""",,,,""

So it looks like a TOAST table has been corrupted. I can see which
table the TOAST table is supporting :

data=# select relname from pg_class where reltoastrelid =
'pg_toast.pg_toast_36298637'::regclass;
relname
---------------
rdvtrack_fd05
(1 row)

And sure enough, I now cannot dump that table :

linux> pg_dump -p5432 -Upostgres -t hmi.rdvtrack_fd05 data >
hmi_rdvtrack_fd05.sql
pg_dump: Dumping the contents of table "rdvtrack_fd05" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: compressed data is corrupted
pg_dump: The command was: COPY hmi.rdvtrack_fd05 (recnum, sunum, slotnum,
sessionid, sessionns, cparms_sg000, v_bzero, v_bscale, carrrot, cmlon,
lonhg, lathg, loncm, midtime, crval1, crval2, crval3, t_start, t_stop,
lonspan, coverage, zonalvel, mai, module, source, input, created, bld_vers,
backgrnd, rejectlist, cmlon_index, loncm_index, lathg_index, lonhg_index,
sg_000_file, sg_001_file, history, datavals, missvals, datamin, datamax,
datamean, datarms, dataskew, datakurt, meanmu, meanlat) TO stdout;

I think I can see the disk files where the tables are stored :

data=# select pg_relation_filepath('hmi.rdvtrack_fd05');
pg_relation_filepath
----------------------
base/16384/36298637
(1 row)

data=# select pg_relation_filepath('pg_toast.pg_toast_36298637');
pg_relation_filepath
----------------------
base/16384/36298640
(1 row)

I can run pg_filedump on those two files, in both cases all
the flags seem unremarkable (although I may not be knowing
what to look for) and it runs to the end of the file, giving
me a message like :

*** End of File Encountered. Last Block Read: 131071 ***

I think this is probably a bug? Every time it happens
it affects the same table, hmi.rdvtrack_fd05. The hardware
we're running on is fairly new, disks and memory are
not throwing errors, and we've never had problems with any
other tables. We're running on CentOS 6.8 - again, not brand
new but hardly decrepit, either. I've rebuilt the table
several times. This time I'd like to give the PG community
a chance to chase it down before I do that.

I've put the two data files, the postgres config and
essentially what I've said here online at :

http://netdrms01.nispdc.nso.edu/pgIssue/36298640
http://netdrms01.nispdc.nso.edu/pgIssue/36298637
http://netdrms01.nispdc.nso.edu/pgIssue/postgresql.conf
http://netdrms01.nispdc.nso.edu/pgIssue/PG_ISSUE.txt

If it helps, the TOAST table looks like this :

data=# \d pg_toast.pg_toast_36298637
TOAST table "pg_toast.pg_toast_36298637"
Column | Type
------------+---------
chunk_id | oid
chunk_seq | integer
chunk_data | bytea

And the table the TOAST table is supporting looks like this :

data=# \d hmi.rdvtrack_fd05
Table "hmi.rdvtrack_fd05"
Column | Type | Modifiers
--------------+------------------+-----------
recnum | bigint | not null
sunum | bigint |
slotnum | integer |
sessionid | bigint |
sessionns | text |
cparms_sg000 | text |
v_bzero | double precision |
v_bscale | double precision |
carrrot | integer |
cmlon | real |
lonhg | real |
lathg | real |
loncm | real |
midtime | double precision |
crval1 | real |
crval2 | real |
crval3 | double precision |
t_start | double precision |
t_stop | double precision |
lonspan | real |
coverage | real |
zonalvel | real |
mai | real |
module | text |
source | text |
input | text |
created | double precision |
bld_vers | text |
backgrnd | text |
rejectlist | text |
cmlon_index | bigint |
loncm_index | bigint |
lathg_index | bigint |
lonhg_index | bigint |
sg_000_file | text |
sg_001_file | text |
history | text |
datavals | integer |
missvals | integer |
datamin | real |
datamax | real |
datamean | double precision |
datarms | double precision |
dataskew | double precision |
datakurt | double precision |
meanmu | real |
meanlat | real |
Indexes:
"rdvtrack_fd05_pkey" PRIMARY KEY, btree (recnum)
"rdvtrack_fd05_carrrot" btree (carrrot)
"rdvtrack_fd05_cmlon_index" btree (cmlon_index)
"rdvtrack_fd05_lathg_index" btree (lathg_index)
"rdvtrack_fd05_loncm_index" btree (loncm_index)
"rdvtrack_fd05_lonhg_index" btree (lonhg_index)

Niles Oien, noien(at)nso(dot)edu

--
Niles Oien, National Solar Observatory, Boulder Colorado USA

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2018-05-09 19:31:54 TOAST table repeatedly corrupted
Previous Message Marc Van Olmen 2018-05-09 12:43:31 Re: BUG #15190: Build configure should have option to disable clock_gettime on MacOS 10.12 and up.