Quick Links

O(N^2) when building multi-column MCV lists

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	O(N^2) when building multi-column MCV lists
Date:	2019-06-18 21:43:13
Message-ID:	20190618214313.xrrufnkewq3x63lv@development
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

When experimenting with multi-column MCV lists with statistic target set
to high value (e.g. 10k), I've realized there's an O(N^2) issue in
statext_mcv_build() when computing base frequencies.

We do this:

for (i = 0; i < nitems; i++)
{
...
item->base_frequency = 1.0;
for (j = 0; j < numattrs; j++)
{
int count = 0;
int k;

for (k = 0; k < ngroups; k++)
{
if (multi_sort_compare_dim(j, &groups[i], &groups[k], mss) == 0)
count += groups[k].count;
}

...
}
}

That is, for each item on the MCV list, we walk through all the groups
(for each dimension independently) to determine the total frequency of
the value.

With many groups (which can easily happen for high statistics target),
this can easily get very expensive.

I think the best solution here is to pre-compute frequencies for values
in all dimensions, and then just access that instead of looping through
the groups over and over.

IMHO this is something we should fix for PG12, so I'll put that on the
open items list, and produce a fix shortly.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Responses

Re: O(N^2) when building multi-column MCV lists at 2019-06-21 10:25:50 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2019-06-18 22:16:40	PL/Python fails on new NetBSD/PPC 8.0 install
Previous Message	Tomas Vondra	2019-06-18 21:33:57	Multivariate MCV list vs. statistics target