Re: snapshot too old issues, first around wraparound and then more.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: snapshot too old issues, first around wraparound and then more.
Date: 2020-04-01 21:11:11
Message-ID: 20200401211111.4gya2pu4iqqw4suc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-04-01 15:11:52 -0500, Kevin Grittner wrote:
> On Wed, Apr 1, 2020 at 2:43 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > The thing that makes me really worried is that the contents of the time
> > mapping seem very wrong. I've reproduced query results in a REPEATABLE
> > READ transaction changing (pruned without triggering an error).
>
>
> That is a very big problem. On the sort-of bright side (ironic in light of
> the fact that I'm a big proponent of using serializable transactions), none
> of the uses that I have personally seen of this feature use anything other
> than the default READ COMMITTED isolation level. That might help explain
> the lack of complaints for those using the feature. But yeah, I REALLY
> want to see a solid fix for that!

I don't think it's dependent on RR - it's just a bit easier to verify
that the query results are wrong that way.

> > And I've
> > reproduced rows not getting removed for much longer than than they
> > should, according to old_snapshot_threshold.
> >
> > I suspect one reason for users not noticing either is that
> >
> > a) it's plausible that users of the feature would mostly have
> > long-running queries/transactions querying immutable or insert only
> > data. Those would not notice that, on other tables, rows are getting
> > removed, where access would not trigger the required error.
> >
> > b) I observe long-ish phases were no cleanup is happening (due to
> > oldSnapshotControl->head_timestamp getting updated more often than
> > correct). But if old_snapshot_threshold is small enough in relation to
> > the time the generated bloat becomes problematic, there will still be
> > occasions to actually perform cleanup.
> >
>
> Keep in mind that the real goal of this feature is not to eagerly _see_
> "snapshot too old" errors, but to prevent accidental debilitating bloat due
> to one misbehaving user connection.

I don't think it's an "intentional" inaccuracy issue leading to
this. The map contents are just wrong, in particular the head_timestamp
most of the time is so new that
TransactionIdLimitedForOldSnapshots(). When filling a new bucket,
MaintainOldSnapshotThreshold() unconditionally updates
oldSnapshotControl->head_timestamp to be the current minute, which means
it'll take old_snapshot_threshold minutes till
TransactionIdLimitedForOldSnapshots() even looks at the mapping again.

As far as I can tell, with a large old_snapshot_threshold, it can take a
very long time to get to a head_timestamp that's old enough for
TransactionIdLimitedForOldSnapshots() to do anything. Look at this
trace of a pgbench run with old_snapshot_threshold enabled, showing some of
the debugging output added in the patch upthread.

This is with a threshold of 10min, in a freshly started database:
> 2020-04-01 13:49:00.000 PDT [1268502][2/43571:2068881994] WARNING: head 0 min: filling 1 buckets starting at 0 for whenTaken 1 min, with xmin 2068881994
> 2020-04-01 13:49:00.000 PDT [1268502][2/43571:2068881994] WARNING: old snapshot mapping at "after update" with head ts: 1, current entries: 2 max entries: 20, offset: 0
> entry 0 (ring 0): min 1: xid 2068447214
> entry 1 (ring 1): min 2: xid 2068881994
>
> 2020-04-01 13:50:00.000 PDT [1268505][5/122542:0] WARNING: old snapshot mapping at "before update" with head ts: 1, current entries: 2 max entries: 20, offset: 0
> entry 0 (ring 0): min 1: xid 2068447214
> entry 1 (ring 1): min 2: xid 2068881994
>
> 2020-04-01 13:50:00.000 PDT [1268505][5/122542:0] WARNING: head 1 min: updating existing bucket 1 for whenTaken 2 min, with xmin 2069199511
> 2020-04-01 13:50:00.000 PDT [1268505][5/122542:0] WARNING: old snapshot mapping at "after update" with head ts: 1, current entries: 2 max entries: 20, offset: 0
> entry 0 (ring 0): min 1: xid 2068447214
> entry 1 (ring 1): min 2: xid 2069199511
>
> 2020-04-01 13:51:00.000 PDT [1268502][2/202674:2069516501] WARNING: old snapshot mapping at "before update" with head ts: 1, current entries: 2 max entries: 20, offset: 0
> entry 0 (ring 0): min 1: xid 2068447214
> entry 1 (ring 1): min 2: xid 2069199511
>
> 2020-04-01 13:51:00.000 PDT [1268502][2/202674:2069516501] WARNING: head 1 min: filling 2 buckets starting at 0 for whenTaken 3 min, with xmin 2069516499
> 2020-04-01 13:51:00.000 PDT [1268502][2/202674:2069516501] WARNING: old snapshot mapping at "after update" with head ts: 3, current entries: 4 max entries: 20, offset: 0
> entry 0 (ring 0): min 3: xid 2068447214
> entry 1 (ring 1): min 4: xid 2069199511
> entry 2 (ring 2): min 5: xid 2069516499
> entry 3 (ring 3): min 6: xid 2069516499
> ...
> 2020-04-01 14:03:00.000 PDT [1268504][4/1158832:2075918094] WARNING: old snapshot mapping at "before update" with head ts: 7, current entries: 8 max entries: 20, offset: 0
> entry 0 (ring 0): min 7: xid 2068447214
> entry 1 (ring 1): min 8: xid 2071112480
> entry 2 (ring 2): min 9: xid 2071434473
> entry 3 (ring 3): min 10: xid 2071755177
> entry 4 (ring 4): min 11: xid 2072075827
> entry 5 (ring 5): min 12: xid 2072395700
> entry 6 (ring 6): min 13: xid 2072715464
> entry 7 (ring 7): min 14: xid 2073035816

Before the mapping change the database had been running for 15
minutes. But the mapping starts only at 7 minutes from start. And then
is updated to

> 2020-04-01 14:03:00.000 PDT [1268504][4/1158832:2075918094] WARNING: head 7 min: filling 8 buckets starting at 0 for whenTaken 15 min, with xmin 2075918093
> 2020-04-01 14:03:00.000 PDT [1268504][4/1158832:2075918094] WARNING: old snapshot mapping at "after update" with head ts: 15, current entries: 16 max entries: 20, offset: 0
> entry 0 (ring 0): min 15: xid 2068447214
> entry 1 (ring 1): min 16: xid 2071112480
> entry 2 (ring 2): min 17: xid 2071434473
> entry 3 (ring 3): min 18: xid 2071755177
> entry 4 (ring 4): min 19: xid 2072075827
> entry 5 (ring 5): min 20: xid 2072395700
> entry 6 (ring 6): min 21: xid 2072715464
> entry 7 (ring 7): min 22: xid 2073035816
> entry 8 (ring 8): min 23: xid 2075918093
> entry 9 (ring 9): min 24: xid 2075918093
> entry 10 (ring 10): min 25: xid 2075918093
> entry 11 (ring 11): min 26: xid 2075918093
> entry 12 (ring 12): min 27: xid 2075918093
> entry 13 (ring 13): min 28: xid 2075918093
> entry 14 (ring 14): min 29: xid 2075918093
> entry 15 (ring 15): min 30: xid 2075918093

be considered having started in that moment. And we expand the size of
the mapping by 8 at the same time, filling the new buckets with the same
xid. Despite there being a continuous workload.

After a few more minutes we get:
> 2020-04-01 14:07:00.000 PDT [1268503][3/1473617:2077202085] WARNING: head 15 min: updating existing bucket 4 for whenTaken 19 min, with xmin 2077202085
> 2020-04-01 14:07:00.000 PDT [1268503][3/1473617:2077202085] WARNING: old snapshot mapping at "after update" with head ts: 15, current entries: 16 max entries: 20, offset: 0
> entry 0 (ring 0): min 15: xid 2068447214
> entry 1 (ring 1): min 16: xid 2076238895
> entry 2 (ring 2): min 17: xid 2076559154
> entry 3 (ring 3): min 18: xid 2076880731
> entry 4 (ring 4): min 19: xid 2077202085
> entry 5 (ring 5): min 20: xid 2072395700
> entry 6 (ring 6): min 21: xid 2072715464
> entry 7 (ring 7): min 22: xid 2073035816
> entry 8 (ring 8): min 23: xid 2075918093
> entry 9 (ring 9): min 24: xid 2075918093
> entry 10 (ring 10): min 25: xid 2075918093
> entry 11 (ring 11): min 26: xid 2075918093
> entry 12 (ring 12): min 27: xid 2075918093
> entry 13 (ring 13): min 28: xid 2075918093
> entry 14 (ring 14): min 29: xid 2075918093
> entry 15 (ring 15): min 30: xid 2075918093
>

Note how the xids are not monotonically ordered. And how IsLimited still
won't be able to make use of the mapping, as the head timestamp is only
4 minutes old (whenTaken == 19 min, head == 15min).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-04-01 21:12:41 Proposal: Expose oldest xmin as SQL function for monitoring
Previous Message Andres Freund 2020-04-01 21:01:55 Re: backup manifests