Quick Links

Re: ALTER TABLE lock strength reduction patch is unsafe

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-19 21:13:09
Message-ID:	BANLkTi=a0dEjCT0BLj6iZ3t4f=bHJ5KJTA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 11:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> 4. Backend #2 visits the new, about-to-be-committed version of
>>> pgbench_accounts' pg_class row just before backend #3 commits.
>>> It sees the row as not good and keeps scanning. By the time it
>>> reaches the previous version of the row, however, backend #3
>>> *has* committed. So that version isn't good according to SnapshotNow
>>> either.
>
>> <thinks some more>
>
>> Why isn't this a danger for every pg_class update? For example, it
>> would seem that if VACUUM updates relpages/reltuples, it would be
>> prone to this same hazard.
>
> VACUUM does that with an in-place, nontransactional update. But yes,
> this is a risk for every transactional catalog update.

Well, after various efforts to fix the problem, I notice that there
are two distinct problems brought out by your test case.

One of them is caused by my patch, one of them was already there in
the code - this latter one is actually the hardest to fix.

It took me about an hour to fix the first bug, but its taken a while
of thinking about the second before I realised it was a pre-existing
bug.

The core problem is, as you observed that a pg_class update can cause
rows to be lost with concurrent scans.
We scan pg_class in two ways: to rebuild a relcache entry based on a
relation's oid (easy fix). We also scan pg_class to resolve the name
to oid mapping. The name to oid mapping is performed *without* a lock
on the relation, since we don't know which relation to lock. So the
name lookup can fail if we are in the middle of a pg_class update.
This is an existing potential bug in Postgres unrelated to my patch.
Ref: SearchCatCache()

I've been looking at ways to lock the relation name and namespace
prior to the lookup (or more precisely the hash), but its worth
discussing whether we want that at all?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: ALTER TABLE lock strength reduction patch is unsafe at 2011-06-17 22:41:24 from Tom Lane

Responses

Re: ALTER TABLE lock strength reduction patch is unsafe at 2011-06-20 08:48:49 from Simon Riggs
Re: ALTER TABLE lock strength reduction patch is unsafe at 2011-06-20 13:14:23 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2011-06-19 21:16:13	Re: the big picture for index-only scans
Previous Message	Florian Pflug	2011-06-19 21:10:49	Re: the big picture for index-only scans