pgsql: Apply multiple multivariate MCV lists when possible

From: Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Apply multiple multivariate MCV lists when possible
Date: 2020-01-13 00:21:48
Message-ID: E1iqnUK-00054q-Cx@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Apply multiple multivariate MCV lists when possible

Until now we've only used a single multivariate MCV list per relation,
covering the largest number of clauses. So for example given a query

SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1

and extended statistics on (a,b) and (c,d), we'd only pick and use one
of them. This commit improves this by repeatedly picking and applying
the best statistics (matching the largest number of remaining clauses)
until no additional statistics is applicable.

This greedy algorithm is simple, but may not be optimal. A different
choice of statistics may leave fewer clauses unestimated and/or give
better estimates for some other reason.

This can however happen only when there are overlapping statistics, and
selecting one makes it impossible to use the other. E.g. with statistics
on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d).
But it's not clear which option is the best one.

We however assume cases like this are rare, and the easiest solution is
to define statistics covering the whole group of correlated columns. In
the future we might support overlapping stats, using some of the clauses
as conditions (in conditional probability sense).

Author: Tomas Vondra
Reviewed-by: Mark Dilger, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/eae056c19ee8f5ebc45ac0fe13181f91c8791e00

Modified Files
--------------
src/backend/statistics/extended_stats.c | 139 +++++++++++++++++---------------
src/test/regress/expected/stats_ext.out | 57 +++++++++++++
src/test/regress/sql/stats_ext.sql | 35 ++++++++
3 files changed, 167 insertions(+), 64 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Amit Kapila 2020-01-13 02:33:41 pgsql: Delete empty pages in each pass during GIST VACUUM.
Previous Message Tom Lane 2020-01-12 19:37:33 pgsql: Fix edge-case crashes and misestimation in range containment sel