Re: Efficiently Triggering Autovacuum Analyze?

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>, Cory Tucker <cory(dot)tucker(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Efficiently Triggering Autovacuum Analyze?
Date: 2015-12-31 16:55:45
Message-ID: 56855E11.1060907@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12/30/15 1:31 PM, Joe Conway wrote:
> On 12/30/2015 11:09 AM, Cory Tucker wrote:
> With this scenario you can expect an autoanalyze every 5 million rows
> and autovacuum every 10 million. In my experience (and based on your
> description, yours as well) this is not often enough. Not only that,
> when it does run it runs longer than you would like, causing an I/O hit
> while it does.
>
> You probably should tune this table specifically, e.g.

Another option is to explicitly analyze then SELECT from the table after
you're done inserting into it. The advantage is it doesn't tie up an
autovac worker and you can ensure that the newly added tuples get
properly hinted.

You can run the ANALYZE immediately after your insert finishes. The
reason to do that is to get up-to-date statistics for other queries to
use. That can be particularly important if the new rows have values
significantly outside what was in the table before. That's common with
things like sequence IDs and timestamp data.

The SELECT is a bit trickier; you want to ensure that there is no
transaction still running in the database that's older than the
transaction that added all the new data. You can check that by comparing
the xmin field of one of your new rows with
txid_snapshot_xmin(txid_current_snapshot()). Note that because of
wraparound you can't do a simple comparison; txid 3 is actually greater
than txid 2^32.

The whole point of this SELECT is to get the new tuples hinted while the
pages are still hot in cache. If you don't do that, the next query that
reads the tuple will have to set the hints, which also dirties the page.
VACUUM does that too, but there's really no point in having vacuum run
through the entire table just to set hints on less than 1% of it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jim Nasby 2015-12-31 17:00:21 Re: cannot get stable function to use index
Previous Message Shay Cohavi 2015-12-31 08:29:27 how to decrease the promotion time when performing a multiple failovers.....