Re: ALTER TABLE lock strength reduction patch is unsafe

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE lock strength reduction patch is unsafe
Date: 2011-12-19 17:05:09
Message-ID: CA+TgmoYCAcka193syVqiwDpCbxbwcLr6-7NfS19uCrScup4+sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 17, 2011 at 6:11 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> A simpler example shows it better. Tom's idea prevents 2 rows being
> returned, but doesn't prevent zero rows.

I don't think it prevents either one. Suppose we read and return a
tuple, and then someone HOT-updates it and commits. SnapshotNow is
very happy to also return the updated version of that same tuple on
the next iteration of the scan. See commit
c0f03aae0469e758964faac0fb741685170c39a5.

> That's a useful improvement, but not the only thing.
>
> ISTM we can run a scan as we do now, without locking. If scan returns
> zero rows we scan again, but take a definition lock first. ALTER TABLE
> holds the definition lock while running, which won't be a problem
> because we would only use that on shorter AT statements.
>
> Definition lock being the type of lock described earlier on this
> thread, that we already have code for. So we have code for all the
> parts we need to make this work.
>
> So that means we'd be able to plug the gaps noted, without reducing
> performance on the common code paths.
>
> I don't see any objections made so far remain valid with that approach.

I think a retry loop is a possibly useful tool for solving this
problem, but I'm very skeptical about the definition locks, unless we
can arrange to have them taken just before commit (regardless of when
during the transaction ALTER TABLE was executed) and released
immediately afterwards. What I think might be even better is
something along the lines of what Noah Misch did RangeVarGetRelid --
don't try to lock out concurrent activity, just detect it and redo the
operation if anything bad happens while we're in process. In this
case, that would mean having a mechanism to know whether any
concurrent transaction had updated the catalog we're scanning during
the scan.

Yet another option, which I wonder whether we're dismissing too
lightly, is to just call GetSnapshotData() and do the scan using a
plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow,
but is it expensive enough to worry about?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2011-12-19 17:07:00 Re: Page Checksums
Previous Message Andrew Dunstan 2011-12-19 16:56:52 Re: pgstat wait timeout