From: | Frédéric Yhuel <frederic(dot)yhuel(at)dalibo(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Subject: | Re: New GUC autovacuum_max_threshold ? |
Date: | 2024-04-26 08:10:20 |
Message-ID: | ba3a146c-1bdb-45f4-a7fa-02713783a811@dalibo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Le 25/04/2024 à 22:21, Robert Haas a écrit :
> The analyze case, I feel, is really murky.
> autovacuum_analyze_scale_factor stands for the proposition that as the
> table becomes larger, analyze doesn't need to be done as often. If
> what you're concerned about is the frequency estimates, that's true:
> an injection of a million new rows can shift frequencies dramatically
> in a small table, but the effect is blunted in a large one. But a lot
> of the cases I've seen have involved the histogram boundaries. If
> you're inserting data into a table in increasing order, every new
> million rows shifts the boundary of the last histogram bucket by the
> same amount. You either need those rows included in the histogram to
> get good query plans, or you don't. If you do, the frequency with
> which you need to analyze does not change as the table grows. If you
> don't, then it probably does. But the answer doesn't really depend on
> how big the table is already, but on your workload. So it's unclear to
> me that the proposed parameter is the right idea here at all. It's
> also unclear to me that the existing system is the right idea. 🙂
This is very interesting. And what about ndistinct? I believe it could
be problematic, too, in some (admittedly rare or pathological) cases.
For example, suppose that the actual number of distinct values grows
from 1000 to 200000 after a batch of insertions, for a particular
column. OK, in such a case, the default analyze sampling isn't large
enough to compute a ndistinct close enough to reality anyway. But
without any analyze at all, it can lead to very bad planning - think of
a Nested Loop with a parallel seq scan for the outer table instead of a
simple efficient index scan, because the index scan of the inner table
is overestimated (each index scan cost and number or rows returned).
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2024-04-26 08:18:00 | Re: New GUC autovacuum_max_threshold ? |
Previous Message | Michael Banck | 2024-04-26 08:08:33 | Re: New GUC autovacuum_max_threshold ? |