Quick Links

Re: Diagnosing outdated auto analyze dates...

From:	Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
To:	Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
Cc:	pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject:	Re: Diagnosing outdated auto analyze dates...
Date:	2021-04-21 21:57:43
Message-ID:	CAODZiv5DkYC2XMH81wWxeg0hxAt5CK5=rbnkS0LBn-3W0agS4w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

On Wed, Apr 21, 2021 at 5:51 PM Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
wrote:

>
>
> On Wed, Apr 21, 2021 at 5:38 PM Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
> wrote:
>
>> Thanks, really useful detail. One last question: if I go the route of
>> setting auto-vac scale factor to 0 and auto-vac threshold to 10,000 to
>> ensure vacuuming every 10,000 dead tuples, is there still value in
>> adjusting the auto-analyze threshold/scale separately? Does the analyze
>> process do anything other than determine whether it should be vacuumed?
>>
>

And I generally set the analyze to run slightly more often than the vacuum
myself. Unless it's a very low value for the threshold relative to the size
of the table. Then them being the same is usually ok.

>
> Just to clarify, I only recommend setting scale factor to zero on an
> individual table basis and actually figuring out appropriate values for the
> threshold to cause those individual tables to be scanned/vacuumed on a
> regular basis to keep up with your write rate. The cluster-wide values for
> scale factor and threshold that are set in postgresql.conf are generally ok
> for most databases, but I personally tune scale factor down a little bit
> from the default and threshold up a little bit from the default. But you
> definitely want a scale-factor set for the cluster-wide value to ensure any
> write tables get analyze/vacuumed at some point based on a fraction of them
> changing, not just a specific row number change.
>
> Vacuum is not just for dead-tuple cleanup. It keeps many other important
> statistics updated (freespace map, visibility map, etc) and, most
> importantly, prevents transaction id exhaustion.
>
> https://www.postgresql.org/docs/13/routine-vacuuming.html
>
>
>> On Wed, Apr 21, 2021 at 2:33 PM Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Apr 21, 2021 at 5:05 PM Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
>>> wrote:
>>>
>>>> Thank you... so to make it explicit, this table has 15169372 rows,
>>>> autovacuum_analyze_scale_factor is .05, and autovacuum_analyze_threshold is
>>>> 10, so: 15169372 * .05 + 10, which is 758478.60 -- so that means
>>>> auto-analyze will only fire when that number of rows has been deleted or
>>>> updated?
>>>>
>>>>
>>>
>>> Correct. One thing that can be troublesome with larger tables and the
>>> scale factor is that, as time goes on, analyze and vacuum run less and less
>>> often unless your write rate also increases along with the size. So that's
>>> why larger tables can sometimes benefit from individual tuning and setting
>>> scale factor to zero.
>>>
>>> To figure out what to set the threshold to, I typically make a simple
>>> script to do a csv copy output with psql for the pg_stat_user_tables entry
>>> for the table(s) involved on like an hourly interval for 2-3 weeks. Then
>>> pull that into a spreadsheet or import it back into a table. Then figure
>>> out the hourly & daily differences for n_tup_upd + n_tup_del. Then try and
>>> set the thresholds to have autovac run at least once or twice a day and
>>> analyze slightly more often.
>>>
>>> Note that prior to PG 13, insert only tables would never cause autovac
>>> to kick in. 13 now has autovacuum_vacuum_insert_scale_factor
>>> & autovacuum_vacuum_insert_threshold.
>>>
>>>
>>>
>>>> On Wed, Apr 21, 2021 at 1:48 PM Keith Fiske <
>>>> keith(dot)fiske(at)crunchydata(dot)com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 21, 2021 at 4:06 PM Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
>>>>> wrote:
>>>>>
>>>>>> A last_autoanalyze date in pg_stat_all_tables for a table we update
>>>>>> every day is 3/26, another is 3/13 -- I feel strongly somehow these tables
>>>>>> should be auto-analyzed and auto-vacuumed more often, and am wondering
>>>>>> where to look first.
>>>>>>
>>>>>> My autovacuum_analyze_threshold is 10, my autovacuum_vacuum_threshold
>>>>>> is 25.
>>>>>>
>>>>>> How can I better understand why these tables are not more frequently
>>>>>> being auto-analyzed and vacuumed?
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> --
>>>>>> Wells Oliver
>>>>>> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>>>>>>
>>>>>
>>>>> The threshold settings are only part of the calculation that
>>>>> determines when autovacuum kicks in. There's
>>>>> also autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor.
>>>>> The scale factors say that when this fraction of the table has been changed
>>>>> (updates/deletes) then run the relevant operation (vacuum or analyze).
>>>>>
>>>>> The threshold values are added on top of the scale factor values. So,
>>>>> what determines whether your analyze kicks in is actually
>>>>>
>>>>> (autovacuum_analyze_scale_factor * total row
>>>>> count) + autovacuum_analyze_threshold
>>>>>
>>>>> The reason it's like this is to avoid autovacuum kicking in
>>>>> excessively on very small tables when a small % changes. Ex. 10 rows on a
>>>>> 100 row table constantly kicking in autovacuum when scale factor is set to
>>>>> .10. So add on 100-500 for the threshold so autovacuum kicks in when a good
>>>>> number of rows actually changes.
>>>>>
>>>>> If you want a specific number of row changes to cause autovacuum to
>>>>> kick in, you can set the scale factor to zero for that specific table. I
>>>>> only really find that necessary myself on rather large tables though (10s
>>>>> of millions of rows).
>>>>>
>>>>> --
>>>>> Keith Fiske
>>>>> Senior Database Engineer
>>>>> Crunchy Data - http://crunchydata.com
>>>>>
>>>>
>>>>
>>>> --
>>>> Wells Oliver
>>>> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>>>>
>>>
>>>
>>> --
>>> Keith Fiske
>>> Senior Database Engineer
>>> Crunchy Data - http://crunchydata.com
>>>
>>
>>
>> --
>> Wells Oliver
>> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>>
>
>
> --
> Keith Fiske
> Senior Database Engineer
> Crunchy Data - http://crunchydata.com
>

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

In response to

Re: Diagnosing outdated auto analyze dates... at 2021-04-21 21:51:01 from Keith Fiske

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Victor Sudakov	2021-04-22 03:16:42	A streaming replica catching up
Previous Message	Keith Fiske	2021-04-21 21:51:01	Re: Diagnosing outdated auto analyze dates...