Quick Links

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date:	2021-11-24 01:32:25
Message-ID:	20211124013225.d67t32hkcbbbsjjc@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2021-11-23 17:01:20 -0800, Peter Geoghegan wrote:
> > On reason for my doubt is the following:
> >
> > We can set all-visible on a page without a FPW image (well, as long as hint
> > bits aren't logged). There's a significant difference between needing to WAL
> > log FPIs for every heap page or not, and it's not that rare for data to live
> > shorter than autovacuum_freeze_max_age or that limit never being reached.
>
> This sounds like an objection to one specific heuristic, and not an
> objection to the general idea.

I understood you to propose that we do not have separate frozen and
all-visible states. Which I think will be problematic, because of scenarios
like the above.

> The only essential part is "opportunistic freezing during vacuum, when the
> cost is clearly very low, and the benefit is probably high". And so it now
> seems you were making a far more limited statement than I first believed.

I'm on board with freezing when we already dirty out the page, and when doing
so doesn't cause an additional FPI. And I don't think I've argued against that
in the past.

> These all-visible (but not all-frozen) heap pages could be considered
> "tenured", since they have survived at least one full VACUUM cycle
> without being unset. So why not also freeze them based on the
> assumption that they'll probably stay that way forever?

Because it's a potentially massive increase in write volume? E.g. if you have
a insert-only workload, and you discard old data by dropping old partitions,
this will often add yet another rewrite, despite your data likely never
getting old enough to need to be frozen.

Given that we often immediately need to start another vacuum just when one
finished, because the vacuum took long enough to reach thresholds of vacuuming
again, I don't think the (auto-)vacuum count is a good proxy.

Maybe you meant this as a more limited concept, i.e. only doing so when the
percentage of all-visible but not all-frozen pages is small?

We could perhaps do better with if we had information about the system-wide
rate of xid throughput and how often / how long past vacuums of a table took.

Greetings,

Andres Freund

In response to

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations at 2021-11-24 01:01:20 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2021-11-24 01:36:41	Re: parallel vacuum comments
Previous Message	houzj.fnst@fujitsu.com	2021-11-24 01:21:28	RE: row filtering for logical replication