Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jian he <jian(dot)universality(at)gmail(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-10-04 09:19:03
Message-ID: 33cfe17d-5462-4c82-9b5e-2865cf870bcf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 03.10.2024 23:15, Peter Geoghegan wrote:
> I do think that this patch got a lot better, and simpler, but I'm a
> little worried about it not covering cases that are only very slightly
> different to the ones that you're targeting. It's easiest to see what
> I mean using an example.
>
> After the standard regression tests have run, the following tests can
> be run from psql (this uses the recent v40 revision):
>
> pg(at)regression:5432 =# create index on tenk1(four, ten); -- setup
> CREATE INDEX
>
> Very fast INT_MAX query, since we successful use the transformation
> added by the patch:
>
> pg(at)regression:5432 =# explain (analyze,buffers) select * from tenk1
> where four = 1 or four = 2_147_483_647 order by four, ten limit 5;
> ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY
> PLAN │
> ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Limit (cost=0.29..1.73 rows=5 width=244) (actual time=0.011..0.014
> rows=5 loops=1) │
> │ Buffers: shared hit=4
> │
> │ -> Index Scan using tenk1_four_ten_idx on tenk1
> (cost=0.29..721.25 rows=2500 width=244) (actual time=0.011..0.012
> rows=5 loops=1) │
> │ Index Cond: (four = ANY ('{1,2147483647}'::integer[]))
> │
> │ Index Searches: 1
> │
> │ Buffers: shared hit=4
> │
> │ Planning Time: 0.067 ms
> │
> │ Execution Time: 0.022 ms
> │
> └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (8 rows)
>
> Much slower query, which is not capable of applying the transformation
> due only to
> the fact that I've "inadvertently" mixed together multiple types (int4
> and int8):
>
> pg(at)regression:5432 =# explain (analyze,buffers) select * from tenk1
> where four = 1 or four = 2_147_483_648 order by four, ten limit 5;
> ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY
> PLAN │
> ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Limit (cost=0.29..2.08 rows=5 width=244) (actual time=0.586..0.588
> rows=5 loops=1) │
> │ Buffers: shared hit=1368
> │
> │ -> Index Scan using tenk1_four_ten_idx on tenk1
> (cost=0.29..900.25 rows=2500 width=244) (actual time=0.586..0.587
> rows=5 loops=1) │
> │ Index Searches: 1
> │
> │ Filter: ((four = 1) OR (four = '2147483648'::bigint))
> │
> │ Rows Removed by Filter: 2500
> │
> │ Buffers: shared hit=1368
> │
> │ Planning Time: 0.050 ms
> │
> │ Execution Time: 0.595 ms
> │
> └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (9 rows)
>
> Do you think this problem can be fixed easily? This behavior seems
> surprising, and is best avoided. Performance cliffs that happen when
> we tweak one detail of a query just seem worth avoiding on general
> principle.
>
> Now that you're explicitly creating RestrictInfos for a particular
> index, I suppose that it might be easier to do this kind of thing --
> you have more context. Perhaps the patch can be made to recognize
> a mix of constants like this as all being associated with the same
> B-Tree operator family (the opfamily that the input opclass belongs
> to)? Perhaps the constants could all be normalized to the same type via
> casts/coercions into the underlying B-Tree input opclass -- that
> extra step should be correct ("64.1.2. Behavior of B-Tree Operator Classes"
> describes certain existing guarantees that this step would need to rely
> on).
>
> Note that the patch already works in cross-type scenarios, with
> cross-type operators. The issue I've highlighted is caused by the use
> of a mixture of types among the constants themselves -- the patch
> wants an array with elements that are all of the same type, which it
> can't quite manage. And so I can come up with a cross-type variant
> query that *can* still use a SAOP as expected with v40, despite
> involving a cross-type = btree operator:
>
> pg(at)regression:5432 [2181876]=# explain (analyze,buffers) select * from
> tenk1 where four = 2_147_483_648 or four = 2_147_483_649 order by
> four, ten limit 5;
> ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY
> PLAN │
> ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Limit (cost=0.29..6.53 rows=1 width=244) (actual time=0.004..0.005
> rows=0 loops=1) │
> │ Buffers: shared hit=2
> │
> │ -> Index Scan using tenk1_four_ten_idx on tenk1 (cost=0.29..6.53
> rows=1 width=244) (actual time=0.004..0.004 rows=0 loops=1) │
> │ Index Cond: (four = ANY
> ('{2147483648,2147483649}'::bigint[]))
> │
> │ Index Searches: 1
> │
> │ Buffers: shared hit=2
> │
> │ Planning Time: 0.044 ms
> │
> │ Execution Time: 0.011 ms
> │
> └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (8 rows)
>
> The fact that this third and final example works as expected makes me
> even more convinced that the second example should behave similarly.
>
Yes, I agree with you that it should be added in the feature but in the
future thread.

The patch does not solve all the problems we planned for, as the
previous patch did (discussed here [0]), but it also does not cause the
performance problems that
were associated with building a suboptimal plan.

Furthermore I think this issue, like the one noted here [0], can be
fixed in a way I proposed before [1], but I assume it is better resolved
in the next thread related to the patch.

[0]
https://www.postgresql.org/message-id/CAPpHfdvF864n%3DLzmjd2XBi9TwboZvrhRtLSt2hCP%2BJVUv6XKzg%40mail.gmail.com

[1]
https://www.postgresql.org/message-id/985f2924-9769-4927-ad6e-d430c394054d%40postgrespro.ru

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-10-04 10:09:44 Re: Logical Replication of sequences
Previous Message Alena Rybakina 2024-10-04 09:15:56 Re: Replace IN VALUES with ANY in WHERE clauses during optimization