Re: PG12 autovac issues

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Justin King <kingpin867(at)gmail(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: PG12 autovac issues
Date: 2020-03-17 23:57:32
Message-ID: CAMkU=1yCz0ww5Fk8K22Fx12vSACAaUhX8X8eQV5ETJ6Oq5bL5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

On Tue, Mar 17, 2020 at 6:19 PM Justin King <kingpin867(at)gmail(dot)com> wrote:

> We have a database that isn't overly large (~20G), but gets incredibly
> frequent updates. Here's an example table:
>
> feedi=# select * from pg_stat_all_tables where schemaname =
> 'production' and relname = 'tita';
>

> n_tup_upd = 4207076934
> n_tup_hot_upd = 4073821079
> n_live_tup = 79942
> n_dead_tup = 71969
> n_mod_since_analyze = 12020
>

> As you can see in this table, there are only ~80K rows, but billions
> of updates.

But how long were those billions of updates spread over? You need to look
at deltas, not absolute values. And note that almost all of those updates
where HOT updates, which don't generate "vacuum debt"

> What we have observed is that the frozenxid reaches the
> 200M mark fairly quickly because of the amount of activity. What is
> interesting is that this happens with the 'postgres' and 'template1'
> databases as well and there is absolutely no activity in those
> databases.
>
> When the 'postgres' and/or 'template1' databases hit the
> freeze_max_age, there are cases where it kicks off an aggressive
> autovac of those tables which seems to prevent autovacs from running
> elsewhere.

Yes, it is a known long-outstanding bug (or malfeature) that one
database reaching autovacuum_freeze_max_age will starve all other databases
of autovac attention. But since the introduction of the "freeze map" in
9.6, it is hard to see how this starvation due to an inactive database
hitting autovacuum_freeze_max_age can last for any meaningful amount of
time. Maybe a shared catalog?

Oddly, this is not consistent, but that condition seems to
> be required. We have observed this across multiple PG12 servers (dev,
> test, staging, production) all with similar workloads.
>
> $ grep -i vacuum /var/log/postgresql/postgres.log | cut -b 1-9 | uniq -c
>

It is hard to figure out what the significance of the occurrence of the
word 'vacuum' in the log file is, without being intimately familiar with
your log files. Could you interpret this some more for us? How many of
those are for 'tita'? How many for databases other than your active one?

Cheers,

Jeff

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Andres Freund 2020-03-18 01:19:52 Re: PG12 autovac issues
Previous Message Adrian Klaver 2020-03-17 23:16:31 Re: Fwd: PG12 autovac issues

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2020-03-18 01:19:52 Re: PG12 autovac issues
Previous Message Adrian Klaver 2020-03-17 23:16:31 Re: Fwd: PG12 autovac issues