From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2018-03-27 00:36:09 |
Message-ID: | 49348328-1a02-7e45-e50f-e1d4d0243c5c@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Dean,
Here is an updated patch (hopefully) fixing the bugs you've reported so
far. In particular, it fixes this:
1) mostly harmless memset bug in UpdateStatisticsForTypeChange
2) passing the right list (stat_clauses) to mcv_clauselist_selectivity
3) corrections to a couple of outdated comments
4) handling of NOT clauses in MCV lists (and in histograms)
The query you posted does not fail anymore, but there's a room for
improvement. We should be able to handle queries like this:
select * from foo where a=1 and not b=1;
But we don't, because we only recognize F_EQSEL, F_SCALARLTSEL and
F_SCALARGTSEL, but F_NEQSEL (which is what "not b=1" uses). Should be
simple to fix, I believe.
5) handling of mcv_lowsel in statext_clauselist_selectivity
I do believe the new behavior is correct - as I suspected, I broke this
during the last rebase, where I also moved some stuff from the histogram
part to the MCV part. I've also added the (sum of MCV frequencies), as
you suggested.
I think we could improve the estimate further by computing ndistinct
estimate, and then using that to compute average frequency of non-MCV
items. Essentially what var_eq_const does:
if (otherdistinct > 1)
selec /= otherdistinct;
Not sure how to do that when there are not just equality clauses.
BTW I think there's a bug in handling the fullmatch flag - it should not
be passed to AND/OR subclauses the way it is, because then
WHERE a=1 OR (a=2 AND b=2)
will probably set it to 'true' because of (a=2 AND b=2). Which will
short-circuit the statext_clauselist_selectivity, forcing it to ignore
the non-MCV part.
But that's something I need to look at more closely tomorrow. Another
thing I probably need to do is to add more regression tests, protecting
against bugs similar to those you found.
Thanks for the feedback so far!
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-multivariate-MCV-lists-20180327.patch.gz | application/gzip | 33.5 KB |
0002-multivariate-histograms-20180327.patch.gz | application/gzip | 42.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2018-03-27 00:38:42 | Re: Parallel Aggregates for string_agg and array_agg |
Previous Message | Alvaro Herrera | 2018-03-27 00:27:40 | Re: ppc64le support in 9.3 branch? |