| From: | Magnus Hagander <magnus(at)hagander(dot)net> | 
|---|---|
| To: | David Steele <david(at)pgmasters(dot)net> | 
| Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: pgsql: Validate page level checksums in base backups | 
| Date: | 2018-04-15 12:08:15 | 
| Message-ID: | CABUevExiHaQSKPXsNVZZOQSG+GLEdvEn3L81pfxSgjXUR4bWkw@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-committers pgsql-hackers | 
On Tue, Apr 10, 2018 at 11:45 PM, David Steele <david(at)pgmasters(dot)net> wrote:
> On 4/10/18 5:24 PM, Tomas Vondra wrote:
>
>>
>> I think there's a bug in sendFile(). We do check checksums on all pages
>> that pass this LSN check:
>>
>>      /*
>>       * Only check pages which have not been modified since the
>>       * start of the base backup. Otherwise, they might have been
>>       * written only halfway and the checksum would not be valid.
>>       * However, replaying WAL would reinstate the correct page in
>>       * this case.
>>       */
>>      if (PageGetLSN(page) < startptr)
>>      {
>>          ...
>>      }
>>
>> Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
>> too, and we'll try to verify the checksum - but we actually do not set
>> checksums on empty pages.
>>
>> So I think it should be something like this:
>>
>>      if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
>>      {
>>          ...
>>      }
>>
>> It might be worth verifying that the page is actually all-zeroes (and
>> not just with corrupted pd_upper value. Not sure it's worth it.
>>
>> I've found this by fairly trivial stress testing - running pgbench and
>> pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
>> With the proposed change I see no further failures.
>>
>
> Good catch, Tomas!
>
> I should have seen this since I had the same issue when I implemented this
> feature in pgBackRest.
>
> Anyway, I agree that your fix looks correct.
>
I've applied this fix, along with the same thing in pg_verify_checksums.
-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Magnus Hagander | 2018-04-15 12:08:16 | pgsql: Don't attempt to verify checksums on new pages | 
| Previous Message | Magnus Hagander | 2018-04-15 11:57:56 | pgsql: Fix build of pg_verify_checksum docs | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Magnus Hagander | 2018-04-15 12:15:42 | Re: Online enabling of checksums | 
| Previous Message | Magnus Hagander | 2018-04-15 11:57:52 | Re: pgsql: Revert "Allow on-line enabling and disabling of data checksums" |