From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | feature request: IN clause optimized through append nodes with UNION ALL |
Date: | 2023-01-20 22:24:31 |
Message-ID: | CAHyXU0z_RJYXj-UGHcXLV00xo_b00TkELmnZS8QuL-=742=4pg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
In the script below, the presence of an IN clause forces the internal
components of the UNION ALL clause to fully compute even though they are
fully optimizable. = ANY doesn't have this issue, so I wonder if there is
any opportunity to convert the 'slow' variant (see below) to the 'fast'
variant. thank you!
merlin
drop table a cascade;
drop table b cascade;
drop table c cascade;
create table a (a_id int primary key);
create table b (b_id int primary key, a_id int references a);
create table c (c_id int primary key, b_id int references b);
insert into a select s from generate_series(1, 50000) s;
insert into b select s, (s % 50000 ) + 1 from generate_series(1, 100000) s;
insert into c select s, (s % 100000 ) + 1 from generate_series(1, 1000000)
s;
create index on b (a_id, b_id);
create index on c (b_id, c_id);
analyze a;
analyze b;
analyze c;
create temp table d (a_id int);
insert into d values (99);
insert into d values (999);
insert into d values (9999);
analyze d;
create or replace view v as
select * from a join b using(a_id) join c using(b_id)
union all select * from a join b using(a_id) join c using(b_id);
explain analyze select * from v where a_id in (select a_id from d); --
this is slow
explain analyze select * from v where a_id = any(array(select a_id from
d)); -- this is fast
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2023-01-20 23:12:01 | Re: pgindent vs variable declaration across multiple lines |
Previous Message | Tomas Vondra | 2023-01-20 21:50:56 | Re: pg_stats and range statistics |