Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-11-10 00:02:17
Message-ID: CAH2-WzkqA3CwdVBn6me8+_sSzKgEcMRMaLGbkPp_DXScxgXcKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Thu, Nov 9, 2017 at 2:24 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Attached is a version of the already existing regression test that both
> reproduces the broken hot chain (and thus failing index lookups) and
> then also the tuple reviving. I don't see any need for letting this run
> with arbitrary permutations.

I thought that the use of every possible permutation was excessive,
myself. It left us with an isolation test that didn't precisely
describe the behavior that is tested. What you came up with seems far,
far better, especially because of the comments you included. The mail
message-id references seem to add a lot, too.

> What I'm currently wondering about is how much we need to harden
> postgres against such existing corruption. If e.g. the hot chains are
> broken somebody might have reindexed thinking the problem is fixed - but
> if they then later vacuum everything goes to shit again, with dead rows
> reappearing.

I don't follow you here. Why would REINDEXing make the rows that
should be dead disappear again, even for a short period of time? It
might do so for index scans, I suppose, but not for sequential scans.
Are you concerned about a risk of somebody not noticing that
sequential scans are still broken?

Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice. (Before this bug was discovered, I would
have expected amcheck to catch problems like it slightly later, during
the Bloom filter probe for that HOT chain...but, in fact, it never
gets there with corruption from this bug in practice, AFAIK.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2017-11-10 00:17:18 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Robert Haas 2017-11-09 23:10:11 pgsql: Add hash partitioning.

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-11-10 00:06:32 Re: Simplify ACL handling for large objects and removal of superuser() checks
Previous Message Jeff Janes 2017-11-09 23:55:36 Re: pg_basebackup --progress output for batch execution