From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com> |
Subject: | Re: snapshot too old issues, first around wraparound and then more. |
Date: | 2021-06-15 19:17:05 |
Message-ID: | CA+TgmoYQOJkki7N7923-Tp5KVE_GR+F1Di77_Vi5imSTioEQ1w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 15, 2021 at 12:51 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So, it's well over a year later, and so far as I can see exactly
> nothing has been done about snapshot_too_old's problems.
Progress has been pretty limited, but not altogether nonexistent.
55b7e2f4d78d8aa7b4a5eae9a0a810601d03c563 fixed, or at least seemed to
fix, the time->XID mapping, which is one of the main things that
Andres said was broken originally. Also, there are patches on this
thread from Thomas Munro to add some test coverage for that case,
another problem Andres noted in his original email. I guess it
wouldn't be too hard to get something committed there, and I'm willing
to do it if Thomas doesn't want to and if there's any prospect of
salvaging the feature.
But that's not clear to me. I'm not clear how exactly how many
problems we know about and need to fix in order to keep the feature,
and I'm also not clear how deep the hole goes. Like, if we need to get
a certain number of specific bugs fixed, I might be willing to do
that. If we need to commit to a major rewrite of the current
implementation, that's more than I can do. But I guess I don't
understand exactly how bad the current problems are. Reviewing
complaints from Andres from this thread:
> Looking at TransactionIdLimitedForOldSnapshots() I think the ts ==
> update_ts threshold actually needs to be ts >= update_ts, right now we
> don't handle being newer than the newest bin correctly afaict (mitigated
> by autovacuum=on with naptime=1s doing a snapshot more often). It's hard
> to say, because there's no comments.
This seems specific enough to be analyzed and anything that is broken
can be fixed.
> The whole lock nesting is very hazardous. Most (all?)
> TestForOldSnapshot() calls happen with locks on on buffers held, and can
> acquire lwlocks itself. In some older branches we do entire *catalog
> searches* with the buffer lwlock held (for RelationHasUnloggedIndex()).
I think it's unclear whether there are live problems in master in this area.
> GetSnapshotData() using snapshot->lsn = GetXLogInsertRecPtr(); as the
> basis to detect conflicts seems dangerous to me. Isn't that ignoring
> inserts that are already in progress?
Discussion on this point trailed off. Upon rereading, I think Andres
is correct that there's an issue; the snapshot's LSN needs to be set
to a value not older than the last xlog insertion that has been
completed rather than, as now, the last one that is started. I guess
to get that value we would need to do something like
WaitXLogInsertionsToFinish(), or some approximation of it e.g.
GetXLogWriteRecPtr() at the risk of unnecessary snapshot-too-old
errors.
> * In read-mostly workloads it can trigger errors in sessions that are
> much younger than old_snapshot_threshold, if the transactionid is not
> advancing.
>
> I've not tried to reproduce, but I suspect this can also cause wrong
> query results. Because a later snapshot can have the same xmin as
> older transactions, it sure looks like we can end up with data from an
> older xmin getting removed, but the newer snapshot's whenTaken will
> prevent TestForOldSnapshot_impl() from raising an error.
I haven't really wrapped my head around this one, but it seems
amenable to a localized fix. It basically amounts to a complaint that
GetOldSnapshotThresholdTimestamp() is returning a newer value than it
should. I don't know exactly what's required to make it not do that,
but it doesn't seem intractable.
> * I am fairly sure that it can cause crashes (or even data corruption),
> because it assumes that DML never needs to check for old snapshots
> (with no meaningful justificiation). Leading to heap_update/delete to
> assume the page header is a tuple.
I don't understand the issue here, really. I assume there might be a
wrong word here, because assuming that the page header is a tuple
doesn't sound like a thing that would actually happen. I think one of
the key problems for this feature is figuring out whether you've got
snapshot-too-old checks in all the right places. I think what is being
alleged here is that heap_update() and heap_delete() need them, and
that it's not good enough to rely on the scan that found the tuple to
be updated or deleted having already performed those checks. It is not
clear to me whether that is true, or how it could cause crashes.
Andres may have explained this to me at some point, but if he did I
have unfortunately forgotten.
My general point here is that I would like to know whether we have a
finite number of reasonably localized bugs or a three-ring disaster
that is unrecoverable no matter what we do. Andres seems to think it
is the latter, and I *think* Peter Geoghegan agrees, but I think that
the point might be worth a little more discussion. I'm unclear whether
Tom's dislike for the feature represents hostility to the concept -
with which I would have to disagree - or a judgement on the quality of
the implementation - which might be justified. For the record, and to
Peter's point, I think it's reasonable to set v15 feature freeze as a
drop-dead date for getting this feature into acceptable shape, but I
would like to try to nail down what we think "acceptable" means in
this context.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2021-06-15 19:23:53 | Re: Improving the isolationtester: fewer failures, less delay |
Previous Message | Tom Lane | 2021-06-15 19:14:24 | Re: Improving the isolationtester: fewer failures, less delay |