From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PoC/WIP: Extended statistics on expressions |
Date: | 2021-08-16 01:32:55 |
Message-ID: | 20210816013255.GS10479@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote:
> > Looking at the current behaviour, there are a couple of things that
> > seem a little odd, even though they are understandable. For example,
> > the fact that
> >
> > CREATE STATISTICS s (expressions) ON (expr), col FROM tbl;
> >
> > fails, but
> >
> > CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl;
> >
> > succeeds and creates both "expressions" and "mcv" statistics. Also, the syntax
> >
> > CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl;
> >
> > tends to suggest that it's going to create statistics on the pair of
> > expressions, describing their correlation, when actually it builds 2
> > independent statistics. Also, this error text isn't entirely accurate:
> >
> > CREATE STATISTICS s ON col FROM tbl;
> > ERROR: extended statistics require at least 2 columns
> >
> > because extended statistics don't always require 2 columns, they can
> > also just have an expression, or multiple expressions and 0 or 1
> > columns.
> >
> > I think a lot of this stems from treating "expressions" in the same
> > way as the other (multi-column) stats kinds, and it might actually be
> > neater to have separate documented syntaxes for single- and
> > multi-column statistics:
> >
> > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
> > ON (expression)
> > FROM table_name
> >
> > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
> > [ ( statistics_kind [, ... ] ) ]
> > ON { column_name | (expression) } , { column_name | (expression) } [, ...]
> > FROM table_name
> >
> > The first syntax would create single-column stats, and wouldn't accept
> > a statistics_kind argument, because there is only one kind of
> > single-column statistic. Maybe that might change in the future, but if
> > so, it's likely that the kinds of single-column stats will be
> > different from the kinds of multi-column stats.
> >
> > In the second syntax, the only accepted kinds would be the current
> > multi-column stats kinds (ndistinct, dependencies, and mcv), and it
> > would always build stats describing the correlations between the
> > columns listed. It would continue to build standard/expression stats
> > on any expressions in the list, but that's more of an implementation
> > detail.
> >
> > It would no longer be possible to do "CREATE STATISTICS s
> > (expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to
> > issue 2 separate "CREATE STATISTICS" commands, but that seems more
> > logical, because they're independent stats.
> >
> > The parsing code might not change much, but some of the errors would
> > be different. For example, the errors "building only extended
> > expression statistics on simple columns not allowed" and "extended
> > expression statistics require at least one expression" would go away,
> > and the error "extended statistics require at least 2 columns" might
> > become more specific, depending on the stats kind.
This still seems odd:
postgres=# CREATE STATISTICS asf ON i FROM t;
ERROR: extended statistics require at least 2 columns
postgres=# CREATE STATISTICS asf ON (i) FROM t;
CREATE STATISTICS
It seems wrong that the command works with added parens, but builds expression
stats on a simple column (which is redundant with what analyze does without
extended stats).
--
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Fan | 2021-08-16 02:24:24 | Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series) |
Previous Message | Justin Pryzby | 2021-08-16 01:31:01 | Re: PoC/WIP: Extended statistics on expressions |