From: | Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Checksums by default? |
Date: | 2017-02-13 01:29:52 |
Message-ID: | d74501e1-6831-f8cb-f81c-2b5abc12d4ed@BlueTreble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/10/17 6:38 PM, Tomas Vondra wrote:
> And no, backups may not be a suitable solution - the failure happens on
> a standby, and the page (luckily) is not corrupted on the master. Which
> means that perhaps the standby got corrupted by a WAL, which would
> affect the backups too. I can't verify this, though, because the WAL got
> removed from the archive, already. But it's a possibility.
Possibly related... I've got a customer that periodically has SR replias
stop in their tracks due to WAL checksum failure. I don't think there's
any hardware correlation (they've seen this on multiple machines).
Studying the code, it occurred to me that if there's any bugs in the
handling of individual WAL record sizes or pointers during SR then you
could get CRC failures. So far every one of these occurrences has been
repairable by replacing the broken WAL file on the replica. I've
requested that next time this happens they save the bad WAL.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2017-02-13 01:57:06 | Re: Should we cacheline align PGXACT? |
Previous Message | Jim Nasby | 2017-02-13 01:21:59 | Re: Adding the optional clause 'AS' in CREATE TRIGGER |