From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ALTER TABLE lock strength reduction patch is unsafe |
Date: | 2011-06-27 08:46:53 |
Message-ID: | BANLkTik+wEm2EWRTMPG4kzeGnO0kfgAfnA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 17, 2011 at 6:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I believe that this is fundamentally unavoidable so long as we use
>>> SnapshotNow to read catalogs --- which is something we've talked about
>>> changing, but it will require a pretty major R&D effort to make it
>>> happen.
>
>> Ouch.
>
>> I wonder if we could avoid this anomaly by taking a throwaway MVCC
>> snapshot at the beginning of each system catalog scan and using it
>> just for the duration of that scan. If nothing that has touched the
>> catalog commits while the scan is open, then this is logically
>> equivalent to SnapshotNow. If something does commit in mid-scan, then
>> we might not get the latest version of the row, but we should end up
>> with exactly one. If it's not the latest one, we'll do the rebuild
>> again upon seeing the next sinval message; in the meantime, the
>> version we're using mustn't be too intolerably bad or it was an error
>> not to use AccessExclusiveLock in the first place.
>
> Yeah, this seems like a possibly workable direction to explore. I like
> this better than what Simon is proposing, because it would fix the
> generic issue for all types of catalog SnapshotNow scans.
>
>> IIUC, the problem with this approach is not correctness but
>> performance. Taking snapshots is (currently) expensive.
>
> Yeah. After mulling it for awhile, what about this idea: we could
> redefine SnapshotNow as a snapshot type that includes a list of
> transactions-in-progress, somewhat like an MVCC snapshot, but we don't
> fill that list from the PGPROC array. Instead, while running a scan
> with SnapshotNow, anytime we determine that a particular XID is
> still-in-progress, we add that XID to the snapshot's list.
> Subsequently, the SnapshotNow code assumes that XID to be
> still-in-progress without consulting its actual state. We reset the XID
> list to empty when starting a new SnapshotNow scan. (We might be able
> to do so less often than that, like only when we do
> AcceptInvalidationMessages, but it's not clear to me that there's any
> real benefit in hanging onto the state longer.)
>
> This costs no performance; if anything it should be faster than now,
> because we'll be replacing expensive transaction state probes with
> relatively-cheap searches of an XID array that should almost always
> be quite short.
>
> With this approach, we would have no serialization anomalies from single
> transactions committing while a scan is in progress. There could be
> anomalies resulting from considering an earlier XID to be in-progress
> while a later XID is considered committed (because we didn't observe
> it until later). So far as I can see offhand, the impact of that would
> be that there might be multiple versions of a tuple that are considered
> good, but never that there would be no version considered good (so long
> as the other XIDs simply updated the tuple and didn't delete it). I
> think this would be all right, since the scan would just seize on the
> first good version it finds. As you argue above, if that's not good
> enough for our purposes then the updater(s) should have taken a stronger
> lock.
I liked this idea, so began to prototype the code. My rough hack is
attached, for the record.
One thing that occurs to me about this is that SnapshotNow with or
without these changes returns the latest committed row and ignores
in-progress changes.
Accepting an older version of the definition will always be
potentially dangerous. I can't see a way of doing this that doesn't
require locking - for changes such as new constraints we need to wait
until in progress changes are complete.
So maybe this idea is worth doing, but I don't think it helps us much
reduce lock levels for DDL.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
snapshotnow_consistent.v1.patch | application/octet-stream | 6.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2011-06-27 09:10:51 | Re: silent_mode and LINUX_OOM_ADJ |
Previous Message | Simon Riggs | 2011-06-27 08:11:52 | Re: ALTER TABLE lock strength reduction patch is unsafe |