From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Daniel Gustafsson <daniel(at)yesql(dot)se>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Manuel Rigger <rigger(dot)manuel(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Failed assertion clauses != NIL |
Date: | 2019-11-24 23:40:11 |
Message-ID: | 20191124234011.iq7wnq2gohdcsroj@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
Attached are two patched, related to this bug report.
0001 - Fix choose_best_statistics to check clauses individually
---------------------------------------------------------------
This modifies the choose_best_statistics function to properly check
which clauses are actually covered by each statistic object, and only
use attnums from those.
The patch ended up pretty small, because we already have all the
necessary info (per-clause attnums) precalculated. Which means this
should not be much more expensive than before.
The main drawback is that this does change signature of a function
defined in statistics.h - we have to pass more info (per-clause bitmaps
and info which clauses are already estimated). Which means ABI break.
I'm not sure how likely it is that external code is calling this
function, but the probability is non-zero. So maybe even if the patch is
fairly small, in backbranches we should use the simple fix with just
returning if the list is NIL.
0002 - WIP: Use extended statistics to estimate OR clauses
----------------------------------------------------------
No matter what 0001 does, it's clear the current code fails to handle OR
clauses that are not fully covered by an extended statistic. For AND
clauses that's not an issue - we simply estimate the covered ones, and
then add the remaining ones by assuming independence.
But clauselist_selectivity sees OR clauses as a single single clause,
and clause_selectivity() simply used the (s1+s2-s1*s2) formula without
considering extended statistics for is_orclause. (For is_andclause we
call clauselist_selectivity recursively, which does consider extended
stats, of course.)
This commit addresses this by calling a clauselist_selectivity variant
for clauses connected by OR, and calling it from the is_orclause branch.
It then requires a bunch of changes elsewhere, to propagate the is_or
flag properly etc.
This is clearly not a thing we could/want to backpatch, and at this
point it's not anywhere close to committable. It's more a WIP patch
highlighting places that will need tweaking to make this work.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-choose_best_statistics-to-check-clauses-individu.patch.gz | application/gzip | 2.8 KB |
0002-WIP-Use-extended-statistics-to-estimate-OR-clauses.patch.gz | application/gzip | 2.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-11-24 23:45:28 | Re: Precision/scale of a numeric attribute of a new data type are not handled correctly when the type is returned by a function |
Previous Message | Tom Lane | 2019-11-24 17:38:50 | Re: BUG #16134: Assertion fails on CREATE gist INDEX over intarray |