Re: Reducing some DDL Locks to ShareLock

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing some DDL Locks to ShareLock
Date: 2008-11-09 22:12:46
Message-ID: 7321.1226268766@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Reviewing away ...

There's a fairly serious problem with this patch, which is that it
overlooks one of the reasons that index_update_stats can work the
way it does:

* 3. Because we execute CREATE INDEX with just share lock on the parent
* rel (to allow concurrent index creations), an ordinary update could
* suffer a tuple-concurrently-updated failure against another CREATE
* INDEX committing at about the same time. We can avoid that by having
* them both do nontransactional updates (we assume they will both be
^^^^^^^^^^^^^^^^^^^^^^^^^^^
* trying to change the pg_class row to the same thing, so it doesn't
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* matter which goes first).

While the above assumption would still be true for two CreateTrigger
operations happening in parallel, it is surely *not* true for, say,
CreateTrigger and CreateIndex in parallel.

The window for a race condition is actually fairly wide, because the
various functions that call heap_inplace_update mostly start with a
tuple they got from syscache, which means it probably reflects the most
recent commit, and could be missing changes from recent nontransactional
updates. So the second guy to apply his change could wipe out the first
guy's change.

I looked at the other existing uses of heap_inplace_update, and I think
there could possibly be a similar issue for vac_update_datfrozenxid(),
though the chance of creating a real problem there seems vanishingly
small. In any case the proposed patch has got lots of chances for
trouble since it introduces several different not-mutually-monotonic
ways of changing a pg_class entry nontransactionally.

I do not think this is a fatal problem, but what it means is that we
have to get some kind of short-term lock on the target tuple and be
sure we read the actual old tuple contents *after* acquiring that lock.
It would seem to be a good idea to fix all the uses of
heap_inplace_update to work that way, even if they seem safe at the
moment.

Any thoughts about the best way to do it? My immediate inclination is
to use heap_lock_tuple but it's a bit expensive.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2008-11-09 22:31:03 Re: Windowing Function Patch Review -> Performance Comparison.
Previous Message Mark Kirkwood 2008-11-09 22:09:16 Re: Re: Hot standby v5 patch - Databases created post backup remain inaccessible + replica SIGSEGV when coming out of standby