Re: PG12 autovac issues

From: Justin King <kingpin867(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: PG12 autovac issues
Date: 2020-03-20 17:42:31
Message-ID: CAE39h23xR9F8qbo5vj9+KL1nFnnTaYJrFde0BtWczQy_+De39Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

On Thu, Mar 19, 2020 at 6:56 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2020-03-19 18:07:14 -0500, Justin King wrote:
> > On Thu, Mar 19, 2020 at 5:35 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2020-03-19 10:23:48 -0500, Justin King wrote:
> > > > > From a single stats snapshot we can't actually understand the actual xid
> > > > > consumption - is it actually the xid usage that triggers the vacuums?
> > > >
> > > > We have looked at this and the xid consumption averages around 1250
> > > > xid/sec -- this is when we see the "aggressive" autovac kick off in
> > > > the logs. What I don't understand is why these xid's are being
> > > > consumed at this rate on the databases with no activity (postgres,
> > > > template1).
> > >
> > > The xid counter is global across all databases.
> >
> > Then what does the "age" value represent for each database in this
> > case? Perhaps I'm misunderstanding what I'm looking at?
> >
> > postgres=# SELECT datname, age(datfrozenxid),
> > current_setting('autovacuum_freeze_max_age') FROM pg_database;
> > datname | age | current_setting
> > -----------+-----------+-----------------
> > postgres | 100937449 | 200000000
> > template1 | 50244438 | 200000000
> > template0 | 160207297 | 200000000
> > feedi | 150147602 | 200000000
>
> Look at datfrozenxid without the age(). age(xid) computes how "old" xid
> is compared to the "next" xid to be assigned. Until vacuum comes around
> and performs work, pg_database.datfrozenxid / pg_class.relfrozenxid are
> constant, since they represent the values actually present in the
> table. But if xids are being consumed, their "age" increases, because
> they're further and further in the past relative to the "newest" xids.
>
>
> > > One big difference between a manual VACUUM and autovacuum is that with
> > > the default settings VACUUM is not throttled, but autovacuum is.
> > >
> > > What are your vacuum_cost_delay, autovacuum_vacuum_cost_delay,
> > > vacuum_cost_limit, autovacuum_vacuum_cost_limit, vacuum_cost_page_hit,
> > > vacuum_cost_page_miss set to?
> >
> > Here are all the vacuum related values for the server:
> >
> > postgres=# select name,setting from pg_settings where name like '%vacuum%';
> > autovacuum = on
> > autovacuum_analyze_scale_factor = 0.1
> > autovacuum_analyze_threshold = 2500
> > autovacuum_freeze_max_age = 200000000
> > autovacuum_max_workers = 8
> > autovacuum_multixact_freeze_max_age = 400000000
> > autovacuum_naptime = 15
> > autovacuum_vacuum_cost_delay = 20
> > autovacuum_vacuum_cost_limit = -1
> > autovacuum_vacuum_scale_factor = 0.2
> > autovacuum_vacuum_threshold = 500
> > autovacuum_work_mem = -1
> > log_autovacuum_min_duration = 0
> > vacuum_cleanup_index_scale_factor = 0.1
> > vacuum_cost_delay = 0
> > vacuum_cost_limit = 1000
> > vacuum_cost_page_dirty = 20
> > vacuum_cost_page_hit = 1
> > vacuum_cost_page_miss = 10
> > vacuum_defer_cleanup_age = 0
> > vacuum_freeze_min_age = 50000000
> > vacuum_freeze_table_age = 150000000
> > vacuum_multixact_freeze_min_age = 5000000
> > vacuum_multixact_freeze_table_age = 150000000
> >
> > I know the database is busy, so the throttling makes sense, but it
> > seems like it would complete eventually.
>
> The cost limit/delay are way too long/small respectively for a busy
> postgres instance.

This does make sense, we will look into adjusting those values.

> > We see blocked autovacs for many hours.
>
> On the same table, or just generally being busy?

We haven't isolated *which* table it is blocked on (assuming it is),
but all autovac's cease running until we manually intervene.

When we get into this state again, is there some other information
(other than what is in pg_stat_statement or pg_stat_activity) that
would be useful for folks here to help understand what is going on?
>
> Greetings,
>
> Andres Freund

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Andres Freund 2020-03-20 19:03:17 Re: PG12 autovac issues
Previous Message Andres Freund 2020-03-19 23:56:26 Re: PG12 autovac issues

Browse pgsql-general by date

  From Date Subject
Next Message Matt Magoffin 2020-03-20 18:34:35 Re: Duplicate key violation on upsert
Previous Message Adrian Klaver 2020-03-20 17:26:15 Re: Could postgres12 support millions of sequences? (like 10 million)