Quick Links

Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <

From:	Kevin Grittner <kgrittn(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Date:	2016-05-31 16:20:43
Message-ID:	CACjxUsPdBAp6E_NNQ6LJrvOaytvbb6TdxsHe2ARxpvSSiAmVUg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers pgsql-hackers

On Tue, May 31, 2016 at 10:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>>> As far as I can see normal index builds will allow concurrent hot
>>> prunes and everything; since those only require page-level
>>> exclusive locks.
>>>
>>> So for !concurrent builds we might end up with a corrupt index.
>>
>> ... by which you mean an index that omits certainly-dead heap
>> tuples which have been the subject of early pruning or vacuum, even
>> if there are still registered snapshots that include those tuples?
>> Or do you see something else?
>
> I think that is the danger.

Well, the *perceived* danger -- since every page in the new index
would be new enough to be recognized as too recently modified to be
safe for a snapshot that could still see the omitted rows, the only
question I'm sorting out on this is how safe it might be to cause
the index to be ignored in planning when using such a snapshot.
That and demonstrating the safe behavior to those not following
closely enough to see what will happen without a demonstration.

>> Again, since both the heap pages involved and all the new index
>> pages would have LSNs recent enough to trigger the old snapshot
>> check, I'm not sure where the problem is,

> This is a good point, though, I think.

The whole perception of risk in this area seems to be based on
getting that wrong; although the review of this area may yield some
benefit in terms of minimizing false positives.

>>> I think many of the places relying on heap scans with !mvcc
>>> snapshots are problematic in that way. Outdated reads will not be
>>> detected by TestForOldSnapshot() (given the (snapshot)->satisfies
>>> == HeapTupleSatisfiesMVCC condition therein), and rely on the
>>> snapshot to be actually working. E.g. CLUSTER/ VACUUM FULL rely
>>> on accurate HEAPTUPLE_RECENTLY_DEAD
>>
>> Don't the "RECENTLY" values imply that the transaction is still
>> running which cause the tuple to be dead? Since tuples are not
>> subject to early pruning or vacuum when that is the case, where do
>> you see a problem? The snapshot itself has the usual xmin et al.,
>> so I'm not sure what you might mean by "the snapshot to be actually
>> working" if not the override at the time of pruning/vacuuming.
>
> Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that
> is newer that the oldest registered snapshot in the system (based on
> some snapshots being ignored) might get a return value of
> HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD.

Since the override xmin cannot advance past the earliest
transaction which is still active, HEAPTUPLE_DEAD indicates that
the transaction causing the tuple to be dead has completed and the
tuple is irrevocably dead -- even if there are still snapshots
registered which can see it. Seeing HEAPTUPLE_DEAD rather than
HEAPTUPLE_RECENTLY_DEAD is strictly limited to tuples which are
certainly and permanently dead and for which the only possible
references are non-MVCC snapshots or existing snapshots subject to
"snapshot too old" monitoring.

> IndexBuildHeapRangeScan(): We might end up with indexIt = false
> instead of indexIt = true. That should be OK because anyone using the
> old snapshot will see a new page LSN and error out. We might also
> fail to set indexInfo->ii_BrokenHotChain = true. I suspect that's a
> problem, but I'm not certain of it.

The latter flag is what I'm currently digging at; but my hope is
that whenever old_snapshot_threshold >= 0 we can set
indexInfo->ii_BrokenHotChain = true to cause the planner to skip
consideration of the index if the snapshot is an "old" one. That
will avoid some false positives (seeing the error when not strictly
necessary). If that works out the way I hope, the only down side
is that a scan using a snapshot from an old transaction or cursor
would use some other index or a heap scan; but we already have that
possibility in some cases -- that seems to be the point of the
flag.

> CheckForSerializableConflictOut: Maybe a problem? If the tuple is
> dead, there's no issue, but if it's recently-dead, there might be.

If the tuple is not visible to the scan, the behavior is unchanged
(a simple return from the function on either HEAPTUPLE_DEAD or
HEAPTUPLE_RECENTLY_DEAD with !visible) and (thus) clearly correct.
If the tuple is visible to us it is currently subject to early
pruning or vacuum (since those operations would get the same
modified xmin) but has not yet had any such treatment since we made
it to this function in the first place. The processing for SSI
purposes would be unaffected by the possibility that there could
later be early pruning/vacuuming.

Thanks for the review and feedback!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold < at 2016-05-31 15:03:12 from Robert Haas

Browse pgsql-committers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2016-05-31 17:56:42	pgsql: Fix whitespace
Previous Message	Tom Lane	2016-05-31 16:05:37	pgsql: Fix typo in CREATE DATABASE syntax synopsis.

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hendrik Visage	2016-05-31 16:22:19	Re: Suggestion for --truncate-tables to pg_restore
Previous Message	Tom Lane	2016-05-31 16:19:18	Re: Statement timeout