pgsql: Speed-up build of MCV lists with many distinct values

From: Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Speed-up build of MCV lists with many distinct values
Date: 2019-07-05 15:55:28
Message-ID: E1hjQYa-0007zV-0X@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Speed-up build of MCV lists with many distinct values

When building multi-column MCV lists, we compute base frequency for each
item, i.e. a product of per-column frequencies for values from the item.
As a value may be in multiple groups, the code was scanning the whole
array of groups while adding items to the MCV list. This works fine as
long as the number of distinct groups is small, but it's easy to trigger
trigger O(N^2) behavior, especially after increasing statistics target.

This commit precomputes frequencies for values in all columns, so that
when computing the base frequency it's enough to make a simple bsearch
lookup in the array.

Backpatch to 12, where multi-column MCV lists were introduced.

Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development

Branch
------
REL_12_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/57f459cf6c48e78aa321ac88418815fc65197517

Modified Files
--------------
src/backend/statistics/mcv.c | 130 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 122 insertions(+), 8 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2019-07-05 16:32:59 pgsql: Add \warn command to psql.
Previous Message Thomas Munro 2019-07-05 09:05:48 pgsql: Improve comment in postgresql.conf.sample.