Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: jian he <jian(dot)universality(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-08-26 10:41:01
Message-ID: fe11e5c9-877e-4306-a6a9-122125c7d2e5@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 26.08.2024 06:12, jian he wrote:
> On Fri, Aug 23, 2024 at 8:58 PM Alexander Korotkov<aekorotkov(at)gmail(dot)com> wrote:
> based on v37.
>
>
> + {
> + /*
> + * We have only Const's. In this case we can construct an array
> + * directly.
> + */
> + int16 typlen;
> + bool typbyval;
> + char typalign;
> + Datum *elems;
> + int i = 0;
> + ArrayType *arrayConst;
> +
> + get_typlenbyvalalign(consttype, &typlen, &typbyval, &typalign);
> +
> + elems = (Datum *) palloc(sizeof(Datum) * list_length(consts));
> + foreach(lc, consts)
> + elems[i++] = ((Const *) lfirst(lc))->constvalue;
> +
> + arrayConst = construct_array(elems, i, consttype,
> + typlen, typbyval, typalign);
> + arrayNode = (Node *) makeConst(arraytype, -1, inputcollid,
> + -1, PointerGetDatum(arrayConst),
> + false, false);
> +
> + pfree(elems);
> + list_free(consts);
> + }
> List "consts" elements can be NULL?
> I didn't find a query to trigger that.
> but construct_array comments says
> "elems (NULL element values are not supported)."
> Do we need to check Const->constisnull for the Const node?

I didn't find any problems here either, but the query plan seems strange
to me: a form of OR expressions is added to the recheck condition. But
we discussed this before and came to the conclusion that this is not a
mistake.

I added the query to the create_index.sql

 EXPLAIN (COSTS OFF)
+SELECT * FROM tenk1
+  WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous =
42 or tenthous is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1
+   Recheck Cond: (((thousand = 42) AND (tenthous IS NULL)) OR
((thousand = 42) AND ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))))
+   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42) OR
(tenthous IS NULL))
+   ->  BitmapOr
+         ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Index Cond: ((thousand = 42) AND (tenthous IS NULL))
+         ->  Bitmap Index Scan on tenk1_thous_tenthous
+               Index Cond: ((thousand = 42) AND (tenthous = ANY
('{1,3,42}'::integer[])))
+(8 rows)

I noticed that the NULL element is not added to the converted array
because it belongs to a different group.

So, I think this problem may not be affect us. Gere we should add
Assertion that an element is not null.

> + /* Construct the list of nested OR arguments */
> + for (j = group_start; j < i; j++)
> + {
> + Node *arg = list_nth(orargs, matches[j].argindex);
> +
> + rargs = lappend(rargs, arg);
> + if (IsA(arg, RestrictInfo))
> + args = lappend(args, ((RestrictInfo *) arg)->clause);
> + else
> + args = lappend(args, arg);
> + }
> the ELSE branch never reached?
>
Reached - if your arg is BoolExpr type, for example if it consists "And"
expressions.
> + /* Construct the nested OR and wrap it with RestrictInfo */
> + *subrinfo = *rinfo;
> + subrinfo->clause = make_orclause(args);
> + subrinfo->orclause = make_orclause(rargs);
> + result = lappend(result, subrinfo);
> should we use memcpy instead of " *subrinfo = *rinfo;"?
>
>
> + /* Sort clauses to make similar clauses go together */
> + pg_qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp);
> Should we use qsort?
> since comments in pg_qsort:
> /*
> * Callers should use the qsort() macro defined below instead of calling
> * pg_qsort() directly.
> */
I think yes, we should.
> +/*
> + * Data structure representing information about OR-clause argument and its
> + * matching index key. Used for grouping of similar OR-clause arguments in
> + * group_similar_or_args().
> + */
> +typedef struct
> +{
> + int indexnum; /* index of the matching index */
> + int colnum; /* index of the matching column */
> + Oid opno; /* OID of the OpClause operator */
> + Oid inputcollid; /* OID of the OpClause input collation */
> + int argindex; /* index of the clause in the list of
> + * arguments */
> +} OrArgIndexMatch;
>
> I am not 100% sure about the comments.
> indexnum: index of the matching index reside in rel->indexlist that
> matches (counting from 0)
> colnum: the column number of the matched index (counting from 0)
>
To be honest, I'm not sure that I completely understand your point here.

I have found an interesting case here:

+EXPLAIN (COSTS OFF) +SELECT * FROM tenk1 + WHERE thousand = 42 AND
(stringu1 = 'MAAAAA' OR stringu1 = 'TUAAAA'::text OR stringu1 =
'OBAAAA'::text); + QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 + Recheck Cond: ((thousand = 42) AND
((stringu1 = 'MAAAAA'::name) OR ((stringu1 = 'TUAAAA'::text) OR
(stringu1 = 'OBAAAA'::text)))) + Filter: ((stringu1 = 'MAAAAA'::name) OR
(stringu1 = 'TUAAAA'::text) OR (stringu1 = 'OBAAAA'::text)) + ->
BitmapAnd + -> Bitmap Index Scan on tenk1_thous_tenthous + Index Cond:
(thousand = 42) + -> BitmapOr + -> Bitmap Index Scan on stringu1_idx +
Index Cond: (stringu1 = 'MAAAAA'::name) + -> Bitmap Index Scan on
stringu1_idx + Index Cond: (stringu1 = ANY ('{TUAAAA,OBAAAA}'::text[]
COLLATE "C")) +(11 rows) +

If OR constants have different types, then they belong to different
groups, and I think that's unfair. I think that conversion to a single
type should be used here - while I’m working on this, I’ll send the code
in the next letter.

And I noticed that there were some tests missing on this, so I added this.

I've updated the patch file to include my and Jian's suggestions, as
well as the diff file if there's no objection.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
diff.diff.no-cfbot text/plain 6.2 KB
v38-0001-Transform-OR-clauses-to-SAOP-s-during-index-matching.patch text/x-patch 41.6 KB
v38-0002-Teach-bitmap-path-generation-about-transforming-OR-c.patch text/x-patch 28.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2024-08-26 10:48:03 Re: Optimize mul_var() for var1ndigits >= 8
Previous Message shveta malik 2024-08-26 10:36:29 Re: Conflict detection and logging in logical replication