Re: snapshot too old issues, first around wraparound and then more.

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: snapshot too old issues, first around wraparound and then more.
Date: 2021-06-17 15:31:30
Message-ID: 20210617153129.GL20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Peter Geoghegan (pg(at)bowt(dot)ie) wrote:
> On Wed, Jun 16, 2021 at 12:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > I would think that it wouldn't really matter inside VACUUM -- it would
> > > only really need to be either an opportunistic pruning or an
> > > opportunistic index deletion thing -- probably both. Most of the time
> > > VACUUM doesn't seem to end up doing most of the work of removing
> > > garbage versions. It's mostly useful for "floating garbage", to use
> > > the proper GC memory management term.
> >
> > I don't fully agree with this. For one, there are workloads where VACUUM
> > removes the bulk of the dead tuples.
>
> It's definitely much more important that VACUUM run often when non-HOT
> updates are the norm, and there are lots of them. But that's probably
> not going to be helped all that much by this technique anyway.

I don't follow this argument. Surely there are many, many cases out
there where there's very few HOT updates but lots of non-HOT updates
which create lots of dead rows that can't currently be cleaned up if
there's a long running transaction hanging around.

> Mostly I'm just saying I'd start elsewhere and do heapam later. And
> probably do VACUUM itself last of all, if that usefully cut scope.

Not quite following what 'elsewhere' means here or what it would entail
if it involves cleaning up dead tuples but doesn't involve heapam. I
can sort of follow the idea of working on the routine page-level cleanup
of tuples rather than VACUUM, except that would seem to require one to
deal with the complexities of ctid chains discussed below and therefore
be a larger and more complicated effort than if one were to tackle
VACUUM and perhaps in the first round cut scope by explicitly ignoring
ctid chains.

> > For another, slowing down VACUUM
> > can cause a slew of follow-on problems, so being careful to not
> > introduce new bottlenecks is important. And I don't think just doing
> > this optimization as part of on-access pruning is reasonable
> > solution. And it's not like making on-access pruning slower is
> > unproblematic either.

I don't know that slowing down VACUUM, which already goes purposefully
slow by default when run out of autovacuum, needs to really be stressed
over, particularly when what we're talking about here are CPU cycles. I
do think it'd make sense to have a heuristic which decides if we're
going to put in the effort to try to do this kind of pruning. That is-
if the global Xmin and the current transaction are only a few thousand
apart or something along those lines then don't bother, but if there's
been 100s of thousands of transactions then enable it (perhaps allowing
control over this or allowing users to explicitly ask VACUUM to 'work
harder' or such).

> I think that on-access pruning is much more important because it's the
> only hope we have of keeping the original heap page intact, in the
> sense that there are no non-HOT updates over time, though there may be
> many HOT updates. And no LP_DEAD items ever accumulate. It's not so
> much about cleaning up bloat as it is about *preserving* the heap
> pages in this sense.
>
> If in the long run it's impossible to keep the page intact in this
> sense then we will still have most of our current problems. It might
> not make that much practical difference if we simply delay the problem
> -- we kinda have to prevent it entirely, at least for a given
> workload. So I'm mostly concerned about keeping things stable over
> time, at the level of individual pages.

I do think that's a worthwhile goal, but if we could get some kind of
cleanup happening, that strikes me as better than the nothing that we
have today. Which side makes sense to tackle first is certainly a
discussion that could be had but I'd go for "do the simple thing first".

> > But as I said nearby, I think the hardest part is figuring out how to
> > deal with ctid chains, not the efficiency of the xid->visibility lookup
> > (or the collection of data necessary for that lookup).
>
> Definitely true.

It strikes me that stressing over ctid chains, while certainly something
to consider, at this point is putting the cart before the horse in this
discussion- there's not much sense in it if we haven't actually got the
data collection piece figured out and working (and hopefully in a manner
that minimizes the overhead from it) and then worked out the logic to
figure out if a given tuple is actually visible to any running
transaction. As I say above, it seems like it'd be a great win even if
it was initially only able to deal with 'routine'/non-chained cases and
only with VACUUM.

The kind of queue tables that I'm thinking of, at least, are ones like
what PgQ uses: https://github.com/pgq/pgq

Now, that already works around our lacking here by using TRUNCATE and
table rotation, but if we improved here then it'd potentially be able to
be rewritten to use routine DELETE's instead of TRUNCATE. Even the
UPDATEs which are done to process a batch for a subscriber look to be
non-HOT due to updating indexed fields anyway (in
pgq.next_batch_custom(), it's setting subscription.sub_batch which has a
UNIQUE btree on it). Looks like there's a HOT UPDATE for the queue
table when a table swap happens, but that UPDATE wouldn't actually be
necessary if we'd fix the issue with just routine INSERT/DELETE leading
to tons of dead tuples that can't be VACUUM'd if a long running
transaction is running, and I doubt that UPDATE was actually
intentionally designed to take advantage of HOT, it just happened to
work that way.

The gist of what I'm trying to get at here is that the use-cases I've
seen, and where people have put in the effort to work around the long
running transaction vs. VACUUM issue by using hacks like table swapping
and TRUNCATE, aren't cases where there's a lot of HOT updating happening
on the tables that are getting bloated due to VACUUM being unable to
clean up tuples. So, if that's actually the easier thing to tackle,
fantastic, let's do it and then figure out how to improve on it to
handle the more complicated cases later. (This presumes that it's
actually possible to essentially 'skip' the hard cases and still have a
working implementation, of course).

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-06-17 15:46:47 Re: pgbench logging broken by time logic changes
Previous Message Tom Lane 2021-06-17 15:30:10 Re: Centralizing protective copying of utility statements