From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Potential autovacuum optimization: new tables |
Date: | 2012-10-13 01:49:48 |
Message-ID: | 5078C8BC.5060303@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> No, it's not that easy. The question you have to ask is "when has that
> initial write burst stopped?". As an example, if autovacuum happened to
> see that table in the instant after CREATE, it might autovacuum it while
> it's still empty, and then this rule fails to trigger any further effort.
Well, frankly, it would be useful to know it's empty too. If you really
wanna see that 1000-row default estimate bite you on the tuchas, try a
JOIN against an empty new table.
> Personally I've always thought that autovacuum's rules should be based
> on a percentage of rows changed, not an absolute threshold (or maybe in
> addition to an absolute threshold). This way, if you create a table and
> insert 10 rows, that would make it subject to analyze on-sight, even if
> autovac had managed to pass by while it was still empty, because the
> percentage-changed is infinite. Then, if you insert the other 35 rows
> you meant to insert, it's *again* subject to autoanalyze on the next
> pass, because the percentage-changed is still 350%
> I remember having got voted down on the percentage approach back when
> we first put AV into core, but I remain convinced that decision was a
> bad one.
Yeah, I was one of the ones voting against you. The reason not to have
percentage-only is for small tables. Imagine that you have a table with
18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1.
Add two rows --> ANALYZE
UPDATE two rows --> ANALYZE
UPDATE three more rows --> ANALYZE
DELETE three rows --> ANALYZE
Without a threshold, any table under 100 rows which gets UPDATEs would
be continually in the autoanalyze queue, which would not be beneficial;
the churn in pg_statistic alone would be detrimental.
I guess the real issue is that we tried to solve the issue of ANALYZEing
tables of radically different sizes with a simple heuristic of
threshold+scale_factor, and that's proving too simple for actual
production sites. The other end where autoanalyze often falls down is
the high end (tables with a million rows).
Can anyone think of a new heuristic which doesn't involve adding 2-6 new
GUCS knobs?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-10-13 02:03:18 | Re: Potential autovacuum optimization: new tables |
Previous Message | Stephen Frost | 2012-10-13 01:25:33 | Re: Potential autovacuum optimization: new tables |