From: | Chris Travers <chris(dot)travers(at)gmail(dot)com> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Vladimir Rusinov <vrusinov(at)google(dot)com>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Funny WAL corruption issue |
Date: | 2017-08-11 12:53:36 |
Message-ID: | CAKt_ZfuCDb_yCDVfXafSM5bQDY56FPPLDfdipZAgPFFq8xkCag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> On 10 August 2017 at 15:26, Chris Travers <chris(dot)travers(at)gmail(dot)com> wrote:
> >
> >
> > The bitwise comparison is interesting. Remember the error was:
> >
> > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected
> > pageaddr 1E375/61118000 in log segment 000000000001E39C000000E1, offset
> > 1146880
> ...
> > Since this didn't throw a checksum error (we have data checksums
> disabled but wal records ISTR have a separate CRC check), would this
> perhaps indicate that the checksum operated over incorrect data?
>
> No checksum error and this "unexpected pageaddr" doesn't necessarily
> mean data corruption. It could mean that when the database stopped logging
> it was reusing a wal file and the old wal stream had a record boundary
> on the same byte position. So the previous record checksum passed and
> the following record checksum passes but the record header is for a
> different wal stream position.
>
I expect to test this theory shortly.
Assuming it is correct, what can we do to prevent restarts of slaves from
running into it?
> I think you could actually hack xlogdump to ignore this condition and
> keep outputting and you'll see whether the records that follow appear
> to be old wal log data. I haven't actually tried this though.
>
> --
> greg
>
--
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-08-11 13:00:57 | Re: SCRAM protocol documentation |
Previous Message | Augustine, Jobin | 2017-08-11 12:41:03 | Re: [HACKERS] Replication to Postgres 10 on Windows is broken |