From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | O(N^2) when building multi-column MCV lists |
Date: | 2019-06-18 21:43:13 |
Message-ID: | 20190618214313.xrrufnkewq3x63lv@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
When experimenting with multi-column MCV lists with statistic target set
to high value (e.g. 10k), I've realized there's an O(N^2) issue in
statext_mcv_build() when computing base frequencies.
We do this:
for (i = 0; i < nitems; i++)
{
...
item->base_frequency = 1.0;
for (j = 0; j < numattrs; j++)
{
int count = 0;
int k;
for (k = 0; k < ngroups; k++)
{
if (multi_sort_compare_dim(j, &groups[i], &groups[k], mss) == 0)
count += groups[k].count;
}
...
}
}
That is, for each item on the MCV list, we walk through all the groups
(for each dimension independently) to determine the total frequency of
the value.
With many groups (which can easily happen for high statistics target),
this can easily get very expensive.
I think the best solution here is to pre-compute frequencies for values
in all dimensions, and then just access that instead of looping through
the groups over and over.
IMHO this is something we should fix for PG12, so I'll put that on the
open items list, and produce a fix shortly.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-06-18 22:16:40 | PL/Python fails on new NetBSD/PPC 8.0 install |
Previous Message | Tomas Vondra | 2019-06-18 21:33:57 | Multivariate MCV list vs. statistics target |