Re: POC, WIP: OR-clause support for indexes

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-11-15 13:12:49
Message-ID: CAPpHfdvjtEWqjVcPd3-JQw8yCoppMXjK8kHnvinxBXGMZt-M_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Jian!

On Mon, Oct 28, 2024 at 9:19 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> * NOTE: returns NULL if clause is an OR or AND clause; it is the
> * responsibility of higher-level routines to cope with those.
> */
> static IndexClause *
> match_clause_to_indexcol(PlannerInfo *root,
> RestrictInfo *rinfo,
> int indexcol,
> IndexOptInfo *index)
>
> the above comments need a slight change.
>
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand = 1
> OR thousand = 3);
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: ((thousand = 1) OR (thousand = 3))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand in
(1,3));
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> tenk2 index:
> Indexes:
> "tenk2_thous_tenthous" btree (thousand, tenthous)
>
> Looking at the above cases, I found out the "Recheck Cond" is
> different from "Index Cond".
> I wonder why there is a difference, or if they should be the same.
> then i come to:
> match_orclause_to_indexcol
>
> /*
> * Finally, build an IndexClause based on the SAOP node. Use
> * make_simple_restrictinfo() to get RestrictInfo with clean
selectivity
> * estimations because it may differ from the estimation made for an
OR
> * clause. Although it is not a lossy expression, keep the old
version of
> * rinfo in iclause->rinfo to detect duplicates and recheck the
original
> * clause.
> */
> iclause = makeNode(IndexClause);
> iclause->rinfo = rinfo;
> iclause->indexquals = list_make1(make_simple_restrictinfo(root,
>
&saopexpr->xpr));
> iclause->lossy = false;
> iclause->indexcol = indexcol;
> iclause->indexcols = NIL;
>
> looking at create_bitmap_scan_plan.
> I think "iclause->rinfo" itself won't be able to detect duplicates.
> since the upper code would mostly use "iclause->indexquals" for
comparison?
>
>
> typedef struct IndexClause comments says:
> "
> * indexquals is a list of RestrictInfos for the directly-usable index
> * conditions associated with this IndexClause. In the simplest case
> * it's a one-element list whose member is iclause->rinfo. Otherwise,
> * it contains one or more directly-usable indexqual conditions extracted
> * from the given clause. The 'lossy' flag indicates whether the
> * indexquals are semantically equivalent to the original clause, or
> * represent a weaker condition.
> "
> should lossy be iclause->lossy be true at the end of
match_orclause_to_indexcol?
> since it meets the comment condition: "semantically equivalent to the
> original clause"
> or is the above comment slightly wrong?
>
> in match_orclause_to_indexcol
> i changed from
> iclause->rinfo = rinfo;
> to
> iclause->rinfo = make_simple_restrictinfo(root,
> &saopexpr->xpr);
>
> as expected. now the "Recheck Cond" is same as "Index Cond"
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> I am not sure of the implication of this change.

As comment says IndexClause.rinfo must be original restriction or join
clause.

typedef struct IndexClause
{
pg_node_attr(no_copy_equal, no_read, no_query_jumble)

NodeTag type;
struct RestrictInfo *rinfo; /* original restriction or join clause */

I don't see any reason why should we violate that. Note that there are
already cases when "Recheck Cond" doesn't match "Index Cond". For instance:

# explain select * from t where 100000 > i;
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=1860.66..7524.75 rows=99127 width=4)
Recheck Cond: (100000 > i)
-> Bitmap Index Scan on t_i_idx (cost=0.00..1835.88 rows=99127 width=0)
Index Cond: (i < 100000)
(4 rows)

Thus, this type of mismatch seems normal to me.

IndexClause.lossy should be false in our case (as it is). Lossy
transformation happens when there are cases of false positives in the
transformed clause. The comment gives an example of transformation "x
LIKE 'foo%bar'" into "x >= 'foo' AND x < 'fop'". In this case, 'fooqux'
would be case of false positive matching transformed clause but not
matching the original clause. In our case, original and transformed
clauses are equivalent. Therefore our transformation isn't lossy.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2024-11-15 13:12:50 Re: FW: Building Postgres 17.0 with meson
Previous Message Mats Kindahl 2024-11-15 13:05:40 Re: Potential ABI breakage in upcoming minor releases