From: | Kevin Grittner <kgrittn(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: snapshot too old, configured by time |
Date: | 2016-03-30 18:53:30 |
Message-ID: | CACjxUsMPFF5yp6PLyM61JKJc7WEoNdLs9VGDf1KYfDZAWskT1Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 20, 2016 at 6:25 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I haven't read the patch, but I wonder: What are the implications here
> for B-Tree page recycling by VACUUM?
> Offhand, I imagine that there'd be some special considerations. Why is
> it okay that an index scan could land on a deleted page with no
> interlock against VACUUM's page recycling? Or, what prevents that from
> happening in the first place?
When the initial "proof of concept" patch was tested by the
customer, it was not effective due to issues related to what you
raise. Autovacuum workers were blocking due to the page pins for
scans using these old snapshots, causing the bloat to accumulate in
spite of this particular patch. This was addressed, at least to a
degree sufficient for this customer, with this patch:
Basically, for most common cases the "super-exclusive" locking has
been eliminated from btree logic; and I have been happy to see that
Simon has been working on dealing with the corner cases where I
hadn't rooted it out.
> I worry that something weird could happen there. For example, perhaps
> the page LSN on what is actually a newly recycled page could be set
> such that the backend following a stale right spuriously raises a
> "snapshot too old" error.
That particular detail doesn't seem to be a realistic concern,
though -- if a page has been deleted, assigned to a new place in
the index with an LSN corresponding to that action, it would be a
pretty big bug if a right-pointer still referenced it.
> I suggest you consider making amcheck [1] a part of your testing
> strategy. I think that this patch is a good idea, and I'd be happy to
> take feedback from you on how to make amcheck more effective for
> testing this patch in particular.
I'm not sure how that would fit in; could you elaborate?
The biggest contradiction making testing hard is that everyone (and
I mean everyone!) preferred to see this configured by time rather
than number of transactions, so there is no change in behavior
without some sort of wait for elapsed time. But nobody wants to
drive time needed for running regression tests too high. Testing
was far easier when a transaction count was used for configuration.
old_snapshot_threshold = -1 (the default) completely disables the
new behavior, and I basically abused a configuration setting of 0
to mean a few seconds so I could get some basic testing added to
make check-world while keeping the additional time for the tests
(barely) below one minute.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2016-03-30 19:21:36 | Re: snapshot too old, configured by time |
Previous Message | Peter Geoghegan | 2016-03-30 18:23:53 | Re: Using quicksort for every external sort run |