| From: | Bruce Momjian <bruce(at)momjian(dot)us> |
|---|---|
| To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
| Subject: | Re: v10 release notes for extended stats |
| Date: | 2020-12-19 20:11:05 |
| Message-ID: | 20201219201105.GI28841@momjian.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Dec 19, 2020 at 01:39:27PM -0600, Justin Pryzby wrote:
> 2017-03-24 [7b504eb28] Implement multivariate n-distinct coefficients
> 2017-04-05 [2686ee1b7] Collect and use multi-column dependency stats
> 2017-05-12 [bc085205c] Change CREATE STATISTICS syntax
>
> The existing notes say:
> |Add multi-column optimizer statistics to compute the correlation ratio and number of distinct values (Tomas Vondra, David Rowley, Álvaro Herrera)
> |New commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS.
> |This feature is helpful in estimating query memory usage and when combining the statistics from individual columns.
>
> "correlation ratio" is referring to stxkind=d (dependencies), right ? That's
> very unclear.
>
> "helpful in estimating query memory usage": I guess it means that this allows
> the planner to correctly account for large vs small number of GROUP BY values,
> but it sounds more like it's going to help a user to estimate memory use.
>
> "when combining the statistics from individual columns." this is referring to
> stxkind=d, handling correlated/redundant clauses, but it'd be hard for a user
> to know that.
>
> Also, maybe it should say "combining stats from columns OF THE SAME TABLE".
>
> So I propose:
> |Allow creation of multi-column statistics objects, for computing the
> |dependencies between columns and number of distinct values of combinations of columns
> |(Tomas Vondra, |David Rowley, Álvaro Herrera)
> |The new commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS.
> |Improved statistics allow the planner to generate better query plans with more accurate
> |estimates of the row count and memory usage when grouping by multiple
> |columns, and more accurate estimates of the row count if WHERE clauses apply
> |to multiple columns and values of some columns are correlated with values of
> |other columns.
Uh, at the time, that was the best text we could come up with. We don't
usually go back to update them unless there is a very good reason, and I
am not seeing that above.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2020-12-19 20:22:34 | Re: Proposed patch for key managment |
| Previous Message | Justin Pryzby | 2020-12-19 19:39:27 | v10 release notes for extended stats |