Quick Links

Re: Berserk Autovacuum (let's save next Mandrill)

From:	Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject:	Re: Berserk Autovacuum (let's save next Mandrill)
Date:	2020-03-19 22:11:23
Message-ID:	CAC8Q8tKi3rh4hWTptvfCByqPbDukAVW-aoxvzN1WkhQtgG9_eA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005? That will be high enough for large
> > tables, which seem to be the main concern here.
>
> Seems low on a first blush. On a large-ish table with 1 billion tuples,
> we'd vacuum every 5 million inserts. For many ETL workloads this will
> result in a vacuum after every bulk operation. Potentially with an index
> scan associated (even if there's no errors, a lot of bulk loads use ON
> CONFLICT INSERT leading to the occasional update).

This is a good and wanted thing. Upthread it was already suggested
that "everyone knows to vacuum after bulk operations". This will go and vacuum
the data while it's hot and in caches, not afterwards, reading from disk.

> > I am not sure about b). In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
>
> This whole chain of arguments seems like it actually has little to do
> with vacuuming insert only/mostly tables. The same problem exists for
> tables that aren't insert only/mostly. Instead it IMO is an argument for
> a general change in logic about when to freeze.
>
> What exactly is it that you want to achieve by having anti-wrap vacuums
> be quicker? If the goal is to reduce the window in which autovacuums
> aren't automatically cancelled when there's a conflicting lock request,
> or in which autovacuum just schedules based on xid age, then you can't
> have wraparound vacuums needing to do substantial amount of work.

The problem hit by Mandrill is simple: in modern cloud environments
it's sometimes simply impossible to read all the data on disk because
of different kinds of throttling.
At some point your production database just shuts down and asks to
VACUUM in single user mode for 40 days.

You want vacuum to happen long before that, preferably when the data
is still in RAM, or, at least, fits your cloud provider's disk burst
performance budget, where performance of block device resembles that
of an SSD and not of a Floppy Disk.

Some more reading on how that works:
https://aws.amazon.com/ru/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Re: Berserk Autovacuum (let's save next Mandrill) at 2020-03-19 21:38:19 from Andres Freund

Responses

Re: Berserk Autovacuum (let's save next Mandrill) at 2020-03-19 22:27:11 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Chaitanya bodlapati	2020-03-19 22:11:55	Fwd: invalid byte sequence for encoding "UTF8": 0x95-while using PGP Encryption -PostgreSQL
Previous Message	Daniel Gustafsson	2020-03-19 22:09:56	Re: Add PostgreSQL home page to --help output