From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2024-10-28 16:55:35 |
Message-ID: | 0026f562-d403-4f97-b9dc-3fe3a279e8c9@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Jian! Thank you for your work on this topic!
On 28.10.2024 10:19, jian he wrote:
> * NOTE: returns NULL if clause is an OR or AND clause; it is the
> * responsibility of higher-level routines to cope with those.
> */
> static IndexClause *
> match_clause_to_indexcol(PlannerInfo *root,
> RestrictInfo *rinfo,
> int indexcol,
> IndexOptInfo *index)
>
> the above comments need a slight change.
>
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand = 1
> OR thousand = 3);
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: ((thousand = 1) OR (thousand = 3))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand in (1,3));
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> tenk2 index:
> Indexes:
> "tenk2_thous_tenthous" btree (thousand, tenthous)
>
> Looking at the above cases, I found out the "Recheck Cond" is
> different from "Index Cond".
> I wonder why there is a difference, or if they should be the same.
> then i come to:
> match_orclause_to_indexcol
>
> /*
> * Finally, build an IndexClause based on the SAOP node. Use
> * make_simple_restrictinfo() to get RestrictInfo with clean selectivity
> * estimations because it may differ from the estimation made for an OR
> * clause. Although it is not a lossy expression, keep the old version of
> * rinfo in iclause->rinfo to detect duplicates and recheck the original
> * clause.
> */
> iclause = makeNode(IndexClause);
> iclause->rinfo = rinfo;
> iclause->indexquals = list_make1(make_simple_restrictinfo(root,
> &saopexpr->xpr));
> iclause->lossy = false;
> iclause->indexcol = indexcol;
> iclause->indexcols = NIL;
>
> looking at create_bitmap_scan_plan.
> I think "iclause->rinfo" itself won't be able to detect duplicates.
> since the upper code would mostly use "iclause->indexquals" for comparison?
>
>
> typedef struct IndexClause comments says:
> "
> * indexquals is a list of RestrictInfos for the directly-usable index
> * conditions associated with this IndexClause. In the simplest case
> * it's a one-element list whose member is iclause->rinfo. Otherwise,
> * it contains one or more directly-usable indexqual conditions extracted
> * from the given clause. The 'lossy' flag indicates whether the
> * indexquals are semantically equivalent to the original clause, or
> * represent a weaker condition.
> "
> should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
> since it meets the comment condition: "semantically equivalent to the
> original clause"
> or is the above comment slightly wrong?
>
> in match_orclause_to_indexcol
> i changed from
> iclause->rinfo = rinfo;
> to
> iclause->rinfo = make_simple_restrictinfo(root,
> &saopexpr->xpr);
>
> as expected. now the "Recheck Cond" is same as "Index Cond"
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> I am not sure of the implication of this change.
>
I may be wrong, but the original idea was to double-check the result
with the original expression.
But I'm willing to agree with you. I think we should add transformed
rinfo variable through add_predicate_to_index_quals function. I attached
the diff file to the letter.
diff --git a/src/backend/optimizer/path/indxpath.c
b/src/backend/optimizer/path/indxpath.c
index 3da7ea8ed57..c68ac7008e6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3463,10 +3463,11 @@ match_orclause_to_indexcol(PlannerInfo *root,
* rinfo in iclause->rinfo to detect duplicates and recheck the
original
* clause.
*/
+ RestrictInfo *rinfo_new = make_simple_restrictinfo(root,
+ &saopexpr->xpr);
iclause = makeNode(IndexClause);
- iclause->rinfo = rinfo;
- iclause->indexquals = list_make1(make_simple_restrictinfo(root,
- &saopexpr->xpr));
+ iclause->rinfo = rinfo_new;
+ iclause->indexquals = add_predicate_to_index_quals(index,
list_make1(rinfo_new));
iclause->lossy = false;
iclause->indexcol = indexcol;
iclause->indexcols = NIL;
I figured out comments that you mentioned and found some addition
explanation.
As I understand it, this processing is related to ensuring that the
selectivity of the index is assessed correctly and that there is no
underestimation, which can lead to the selection of a partial index in
the plan. See comment for the add_predicate_to_index_quals function:
* ANDing the index predicate with the explicitly given indexquals produces
* a more accurate idea of the index's selectivity. *However, we need to be
* careful not to insert redundant clauses, because
clauselist_selectivity()
* is easily fooled into computing a too-low selectivity estimate*. Our
* approach is to add only the predicate clause(s) that cannot be proven to
* be implied by the given indexquals. This successfully handles cases
such
* as a qual "x = 42" used with a partial index "WHERE x >= 40 AND x < 50".
* There are many other cases where we won't detect redundancy, leading
to a
* too-low selectivity estimate, which will bias the system in favor of
using
* partial indexes where possible. That is not necessarily bad though.
*
* *Note that indexQuals contains RestrictInfo nodes while the indpred
* does not, so the output list will be mixed. This is OK for both
* predicate_implied_by() and clauselist_selectivity()*, but might be
* problematic if the result were passed to other things.
*/
In those comments that you mentioned, it was written that this problem
of expression redundancy is checked using the predicate_implied_by
function, note that it is called there.
* In some situations (particularly with OR'd index conditions) we may *
have scan_clauses that are not equal to, but are logically implied by, *
the index quals; so we also try a predicate_implied_by() check to see *
if we can discard quals that way. (predicate_implied_by assumes its *
first input contains only immutable functions, so we have to check * that.)
I also figured out more information about loosy variable. First of all,
I tried changing the value of the variable and did not notice any
difference in regression tests. As I understood, our transformation is
completely equivalent, so loosy should be true. But I don't think they
are needed since our expressions are equivalent. I thought for a long
time about an example where this could be a mistake and didn’t come up
with any of them.
--
Regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
or_any.diff | text/x-patch | 11.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2024-10-28 16:58:08 | Re: protocol-level wait-for-LSN |
Previous Message | Heikki Linnakangas | 2024-10-28 16:52:15 | Re: Avoid possible overflow (src/port/bsearch_arg.c) |