Re: Index scan is not pushed down to union all subquery

From: Marian Wendt <marian(dot)wendt(at)yahoo(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Index scan is not pushed down to union all subquery
Date: 2023-10-05 09:33:57
Message-ID: 3089ff2c-2c50-4503-bd13-80452562756f@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>>> Hi,
>>>
>>> Is it intended that indexes are not pushed down to union all
>>> subqueries if even a single select contains a where clause? Is this
>>> just not implemented, is it impossible to implement or am I doing
>>> something wrong?
>>>
>>> The following query does a SeqScan for "bikes" and "cars" tables even
>>> though IndexScan on their column "dealer_name" would be a magnitude
>>> faster. (Schema with sample data at the bottom of this mail)
>>>
>>> WITH
>>> targets as (
>>> select 'bike' vehicle, id, dealer_name FROM bikes WHERE frame_size = 52
>>> union all
>>> select 'car' vehicle, id, dealer_name FROM cars
>>> -- In the real use case I have here dozens of tables
>>> )
>>> SELECT
>>> dealers.name dealer,
>>> targets.vehicle,
>>> targets.id
>>> FROM
>>> dealers
>>> JOIN targets
>>> ON dealers.name = targets.dealer_name
>>> WHERE dealers.id in (54,12,456,315,468)
>>>
>>>
>>> If the WHERE clause from the "bikes" subquery is removed then first
>>> Index Scan on dealers_pk is made and then Bitmap Index Scans on
>>> "bikes" and "cars" table using indexes on "dealer_name" columns.
>>>
>>>
>>> ---------------------------------------------------------------
>>> -- Available also at:https://www.db-fiddle.com/f/cEXt8HXSaQzsQ2yBDA2Z4H/7
>>> CREATE TABLE dealers AS
>>> SELECT
>>> id,
>>> (SELECT string_agg(CHR(65+(random() * 25)::integer), '') FROM
>>> generate_series(1, 4) WHERE id>0) name
>>> FROM generate_series(1, 1000) AS id
>>> ;
>>> ALTER TABLE dealers ADD primary key (id);
>>> CREATE INDEX ON dealers(name);
>>>
>>>
>>> CREATE TABLE bikes AS
>>> SELECT
>>> generate_series AS id,
>>> (SELECT name FROM dealers WHERE dealers.id = (SELECT
>>> (random()*1000)::int WHERE generate_series>0)) AS dealer_name,
>>> (random()*12+50)::int as frame_size
>>> FROM generate_series(1, 100000);
>>> ALTER TABLE bikes ADD primary key (id);
>>> CREATE INDEX ON bikes(dealer_name);
>>>
>>>
>>> CREATE TABLE cars AS
>>> SELECT
>>> generate_series as id,
>>> (SELECT name FROM dealers WHERE dealers.id = (SELECT
>>> (random()*1000)::int WHERE generate_series>0)) AS dealer_name,
>>> (random()*7+14)::int as wheel_size
>>> FROM generate_series(1, 100000);
>>> ALTER TABLE cars ADD primary key (id);
>>> CREATE INDEX ON cars(dealer_name);
>>>
>>>
>>> ANALYZE;
>>>
>>>
>>> --
>>> - Lauri
>>>
>>>
>>
>> With an INNER JOIN, both tables must be fully checked/matched (check
>> using EXPLAIN ANALYSIS -> HashJoin), so the index cannot be used here.
>> Sorry, didn't consider the WITH part. Please share the detailed query
>> plan for more info.
> The "bikes" subquery uses field "frame_size" in WHERE clause but the
> field does not have an index...
ADD: Consider whether it might make sense to take a more generalist
approach by only having one entity vehicle with the distinction "car",
"bike", etc...?
ADD: Consider to do more complex "detailed" SELECTs that are unioned (if
that is really needed)?
--
regards, marian wendt

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2023-10-05 09:39:19 Re: Multiple inserts with two levels of foreign keys
Previous Message Marian Wendt 2023-10-05 09:20:10 Re: Index scan is not pushed down to union all subquery