Re: TOAST table repeatedly corrupted

From: Niles Oien <noien(at)nso(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: TOAST table repeatedly corrupted
Date: 2018-05-09 21:34:45
Message-ID: CANQ3m6OKUsPZ9c-=5hRBi7_CCVNDy+ED+mJOdwFCPq8u=2nNGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks. I don't have checksums on. I'll look into it on the next build.

The file 36298640.10 didn't show anything unusual under pg_filedump.

I'm betting that we are suffering from a now-fixed TOAST issue, if not the
recently fixed one you mentioned. That's probably all the chasing that's
worth doing here given the dated nature of our production box. On our
development box, where we have some room to move, we're running something a
bit newer :

PostgreSQL 10.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-16), 64-bit

Does that need an upgrade to get this week's TOAST fixes, too? I'm not sure
if CentOS's 'yum upgrade' will pick it up - I have the repo pgdg10/7/x86_64
enabled, will the update show up that way?

Thanks,

Niles.

On Wed, May 9, 2018 at 2:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Niles Oien <noien(at)nso(dot)edu> writes:
> > I am running a reasonably recent version of postgres :
> > PostgreSQL 9.5.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> > 20120313 (Red Hat 4.4.7-17), 64-bit
>
> As David said, that's not terribly recent. If you are going to upgrade,
> I'd suggest waiting till tomorrow and grabbing 9.5.13, because we fixed
> a pretty serious TOAST data corruption bug in this week's batch of
> releases. The expected symptoms of it don't match what you're seeing,
> unfortunately, but nonetheless you ought to be using the latest, just
> in case this is an already-fixed issue.
>
> > 2018-05-09 16:14:03.834 GMT,,,27018,,5af31e4b.698a,1,,2018-05-09
> 16:14:03
> > GMT,12/611211,0,ERROR,XX001,"invalid page in block 1374551 of relation
> > base/16384/36298640",,,,,"automatic vacuum of table
> > ""data.pg_toast.pg_toast_36298637""",,,,""
>
> Block 1374551 would be well past the first segment of the file, since
> in a standard build (1GB segments, 8K blocks) there are only 131072
> pages per segment. This explains why you didn't see any complaints
> from pg_filedump, if you only ran it over the first segment.
>
> If you've not clobbered the DB yet, file 36298640.10 would be what
> to look at, I believe.
>
> > And sure enough, I now cannot dump that table :
> > pg_dump: Error message from server: ERROR: compressed data is corrupted
>
> That's interesting, because it seems to indicate an independent problem.
> The "invalid page" error indicates a bad page header, or possibly a
> page checksum failure; either way the page would not have been allowed
> into the buffer pool. But "compressed data is corrupted" implies that
> we did read a page but the data in it seems messed up. So this evidence
> says you have at least two different corrupted places in that table.
>
> Do you have checksums enabled in this installation? If you're going
> to have to rebuild it, you should probably turn those on (use
> initdb --data-checksums), in hopes of narrowing down what's happening.
>
> > I think this is probably a bug? Every time it happens
> > it affects the same table, hmi.rdvtrack_fd05.
>
> That's mighty suggestive all right, but unfortunately it doesn't
> do much to narrow down the problem :-(
>
> regards, tom lane
>

--
Niles Oien, National Solar Observatory, Boulder Colorado USA

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2018-05-09 22:15:07 Re: TOAST table repeatedly corrupted
Previous Message Tom Lane 2018-05-09 20:47:11 Re: TOAST table repeatedly corrupted