From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
Cc: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Subject: | Re: Replace IN VALUES with ANY in WHERE clauses during optimization |
Date: | 2025-04-01 12:07:59 |
Message-ID: | CAPpHfds-7eJ3ZMgyeVrMfC5E1nTHD4Bp0ch5MZhrYSoiCfERXw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Alena!
On Tue, Apr 1, 2025 at 2:11 AM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
wrote:
> 4.1) explain analyze SELECT ten
>
> FROM onek t WHERE unique1 IN ( VALUES (0), ((2 IN ( SELECT unique2 FROM
> onek c WHERE c.unique2 in ((values(0),(2))))::integer)) );
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------
> Seq Scan on onek t (cost=180.11..410.25 rows=2 width=6) (actual
> time=5.014..13.256 rows=3.00 loops=1) Filter: (unique1 = ANY (ARRAY[0,
> ((ANY (2 = (hashed SubPlan 1).col1)))::integer])) Rows Removed by Filter:
> 10005 *Buffers: shared hit=110* SubPlan 1 -> Seq Scan on onek c
> (cost=0.00..180.10 rows=3 width=4) (actual time=0.022..4.951 rows=2.00
> loops=1) Filter: (unique2 = ANY ('{0,2}'::integer[])) Rows Removed by
> Filter: 10006 *Buffers: shared hit=55* Planning: Buffers: shared hit=6
> dirtied=1 Planning Time: 0.502 ms Execution Time: 13.348 ms (13 rows)
>
> The query plan without our patch:
>
> --------------------------------------------------------------------------------------------------------------------------------------------
> Hash Semi Join (cost=0.05..181.42 rows=2 width=6) (actual time=5.072..9.076
> rows=3.00 loops=1) Hash Cond: (t.unique1 = "*VALUES*".column1) *Buffers:
> shared hit=55 read=55* -> Seq Scan on onek t (cost=0.00..155.08
> rows=10008 width=10) (actual time=0.145..1.802 rows=10008.00 loops=1) *
> Buffers: shared hit=52 read=3* -> Hash (cost=0.03..0.03 rows=2 width=4)
> (actual time=4.908..4.912 rows=2.00 loops=1) Buckets: 1024 Batches: 1
> Memory Usage: 9kB *Buffers: shared hit=3 read=52* -> Values Scan on
> "*VALUES*" (cost=0.00..0.03 rows=2 width=4) (actual time=0.003..4.901
> rows=2.00 loops=1) *Buffers: shared hit=3 read=52* SubPlan 1 -> Hash Semi
> Join (cost=0.05..181.42 rows=2 width=4) (actual time=0.036..4.861 rows=2.00
> loops=1) Hash Cond: (c.unique2 = "*VALUES*_1".column1) *Buffers: shared
> hit=3 read=52* -> Seq Scan on onek c (cost=0.00..155.08 rows=10008
> width=4) (actual time=0.009..2.120 rows=10008.00 loops=1) *Buffers:
> shared hit=3 read=52* -> Hash (cost=0.03..0.03 rows=2 width=4) (actual
> time=0.006..0.008 rows=2.00 loops=1) Buckets: 1024 Batches: 1 Memory Usage:
> 9kB -> Values Scan on "*VALUES*_1" (cost=0.00..0.03 rows=2 width=4) (actual
> time=0.001..0.002 rows=2.00 loops=1) Planning: Buffers: shared hit=102
> read=22 Planning Time: 1.853 ms Execution Time: 9.281 ms (23 rows)
>
I think I managed to understand what is going on.
When we run a query with SOAP over a constant array
then convert_saop_to_hashed_saop_walker() provides acceleration with
hashing.
# explain analyze select * from test where val IN (5000, 4000, 9000, 2000,
1000, 140050);
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..21925.00 rows=6 width=4) (actual
time=2.015..223.984 rows=6.00 loops=1)
Filter: (val = ANY ('{5000,4000,9000,2000,1000,140050}'::integer[]))
Rows Removed by Filter: 999994
Buffers: shared hit=2228 read=2197
Planning Time: 0.246 ms
Execution Time: 224.036 ms
(6 rows)
But when there is expression or subselect, then hashing doesn't work and
query becomes slower.
# explain analyze select * from test where val IN (5000, 4000, 9000, 2000,
1000, (select 140050));
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.01..21925.01 rows=6 width=4) (actual
time=0.904..396.495 rows=6.00 loops=1)
Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, (InitPlan
1).col1]))
Rows Removed by Filter: 999994
Buffers: shared hit=2292 read=2133
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1.00 loops=1)
Planning Time: 0.160 ms
Execution Time: 396.538 ms
(8 rows)
In contrast, hashing is always available with VALUES.
# explain analyze select * from test where val in (VALUES (5000), (4000),
(9000), (2000), (1000), ((select 140050)));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.16..17050.23 rows=6 width=4) (actual
time=1.589..225.061 rows=6.00 loops=1)
Hash Cond: (test.val = "*VALUES*".column1)
Buffers: shared hit=2356 read=2069
InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.003..0.003
rows=1.00 loops=1)
-> Seq Scan on test (cost=0.00..14425.00 rows=1000000 width=4) (actual
time=0.460..91.912 rows=1000000.00 loops=1)
Buffers: shared hit=2356 read=2069
-> Hash (cost=0.08..0.08 rows=6 width=4) (actual time=0.049..0.050
rows=6.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6 width=4)
(actual time=0.009..0.032 rows=6.00 loops=1)
Planning Time: 0.627 ms
Execution Time: 225.155 ms
(12 rows)
I think we should allow our transformation only when the array is constant
(attached patchset). In future we may implement dynamic SAOP hashing, and
then allow our transformation in more cases.
------
Regards,
Alexander Korotkov
Supabase
Attachment | Content-Type | Size |
---|---|---|
v8-0001-Extract-make_SAOP_expr-function-from-match_orclau.patch | application/x-patch | 7.3 KB |
v8-0002-Convert-x-IN-VALUES-.-to-x-ANY-.-then-appropriate.patch | application/x-patch | 32.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2025-04-01 12:31:19 | Re: Hashed IN only applied to first encountered IN |
Previous Message | Daniel Verite | 2025-04-01 12:06:28 | Re: Add partial :-variable expansion to psql \copy |