Re: POC, WIP: OR-clause support for indexes

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-03-11 05:13:02
Message-ID: 7389d0dd-05d5-41b7-a12d-2e73f939f851@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/3/2024 21:51, Alexander Korotkov wrote:
> Hi!
>
> On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > On 5/3/2024 12:30, Andrei Lepikhov wrote:
> > > On 4/3/2024 09:26, jian he wrote:
> > ... and the new version of the patchset is attached.
>
> I made some revisions for the patchset.
Great!
> 1) Use hash_combine() to combine hash values.
Looks better
> 2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.

I'm not convinced about this limit. The initial reason was to combine
long lists of ORs into the array because such a transformation made at
an early stage increases efficiency.
I understand the necessity of this limit in the array decomposition
routine but not in the creation one.

> 3) Better save the original order of clauses by putting hash entries and
> untransformable clauses to the same list.  A lot of differences in
> regression tests output have gone.
I agree that reducing the number of changes in regression tests looks
better. But to achieve this, you introduced a hack that increases the
complexity of the code. Is it worth it? Maybe it would be better to make
one-time changes in tests instead of getting this burden on board. Or
have you meant something more introducing the node type?

> We don't make array values unique.  That might make query execution
> performance somewhat worse, and also makes selectivity estimation
> worse.  I suggest Andrei and/or Alena should implement making array
> values unique.
The fix Alena has made looks correct. But I urge you to think twice:
The optimizer doesn't care about duplicates, so why do we do it?
What's more, this optimization is intended to speed up queries with long
OR lists. Using the list_append_unique() comparator on such lists could
impact performance. I suggest sticking to the common rule and leaving
the responsibility on the user's shoulders.
At least, we should do this optimization later, in one pass, with
sorting elements before building the array. But what if we don't have a
sort operator for the type?

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-03-11 05:25:43 Re: "type with xxxx does not exist" when doing ExecMemoize()
Previous Message Hayato Kuroda (Fujitsu) 2024-03-11 05:03:07 RE: speed up a logical replica setup