Re: Index scan is not pushed down to union all subquery

From: Lauri Kajan <lauri(dot)kajan(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Index scan is not pushed down to union all subquery
Date: 2023-10-05 13:25:16
Message-ID: CAKWoFMK0Gw4JSqMt1FOwtNDehkMxBaKipA7_W43nv=k=y9peRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Oct 5, 2023 at 12:30 PM Marian Wendt <marian(dot)wendt(at)yahoo(dot)com> wrote:
>
> With an INNER JOIN, both tables must be fully checked/matched (check
using EXPLAIN ANALYSIS -> HashJoin), so the index cannot be used here.
> Sorry, didn't consider the WITH part. Please share the detailed query
plan for more info.

Now, it reads all the 100 000 bikes, filters out 91 778 rows then appends
all the 100 000 cars. Then it uses a hash join to match these 108 222 rows
to 5 dealers.

In my opinion the index could be used. What I would do if I were a
database, I would first find names of the dealers (in this case 5 dealers
out of 1000) using the dealer's primary key then find all the bikes that
match with the dealer name using the index on dealer_name column (in sample
data ~500 bikes out of 100 000). Then filter those bikes with the
frame_size (filter out ~460 bikes). Append to this result set all the cars
matching the dealer_name condition (~500 cars out of 100 000) (again find
using the index on the dealer_name column).

Here's the current query plan:

Hash Join (cost=21.53..4511.74 rows=542 width=41) (actual
time=0.233..27.507 rows=479 loops=1)
Hash Cond: (bikes.dealer_name = dealers.name)
-> Append (cost=0.00..3943.40 rows=108280 width=41) (actual
time=0.009..20.351 rows=108222 loops=1)
-> Seq Scan on bikes (cost=0.00..1826.00 rows=8280 width=41)
(actual time=0.009..6.074 rows=8222 loops=1)
Filter: (frame_size = 52)
Rows Removed by Filter: 91778
-> Seq Scan on cars (cost=0.00..1576.00 rows=100000 width=41)
(actual time=0.011..9.175 rows=100000 loops=1)
-> Hash (cost=21.46..21.46 rows=5 width=5) (actual time=0.041..0.043
rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using dealers_pkey on dealers (cost=0.28..21.46
rows=5 width=5) (actual time=0.009..0.038 rows=5 loops=1)
Index Cond: (id = ANY ('{54,12,456,315,468}'::integer[]))
Planning Time: 0.152 ms
Execution Time: 27.558 ms

In my dreams the plan would be something like this:
Nested Loop
-> Index Scan using dealers_pkey on dealers
Index Cond: (id = ANY ('{54,12,456,315,468}'::integer[]))
-> Append
-> Bitmap Heap Scan on bikes
Recheck Cond: (dealer_name = dealers.name)
Filter: (frame_size = 52)
Rows Removed by Filter: 91
-> Bitmap Index Scan on bikes_dealer_name_idx
Index Cond: (dealer_name = dealers.name)
-> Bitmap Heap Scan on cars
Recheck Cond: (dealer_name = dealers.name)
-> Bitmap Index Scan on cars_dealer_name_idx
Index Cond: (dealer_name = dealers.name)

If I don't add the WHERE condition for bikes I get exactly that but without
the filter node. Without the frame_size filter the Execution Time is 1.028
ms

On Thu, Oct 5, 2023 at 12:38 PM Marian Wendt <marian(dot)wendt(at)yahoo(dot)com> wrote:
>
> The "bikes" subquery uses field "frame_size" in WHERE clause but the
field does not have an index...
> ADD: Consider whether it might make sense to take a more generalist
approach by only having one entity vehicle with the distinction "car",
"bike", etc...?
> ADD: Consider to do more complex "detailed" SELECTs that are unioned (if
that is really needed)?

I don't want to index frame size since I don't use it to look up rows by
itself. It should only be used to filter only interested bikes.
This is just a sample case and in my real use case I want to validate my
"source" objects against 80 "target" tables. I could have separate queries
for each target table but that adds overhead in network latencies etc. if
having over 80 separate queries.
And if doing the join separately for each target table and then joining the
results together that is also slow since it accesses the delalers_pk index
that 80 times.

On Thu, Oct 5, 2023 at 12:47 PM Dominique Devienne <ddevienne(at)gmail(dot)com>
wrote:
>
> ...
> Lauri, you haven't said which version of PostgreSQL. Did you assume the
latest v16?

Thanks Dominique for clarifying the question.
I have tested this on Postgres 12 and also with the latest 16.0

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dow Drake 2023-10-05 14:33:41 Re: Multiple inserts with two levels of foreign keys
Previous Message Stephen Frost 2023-10-05 13:19:23 Re: pgBackRest for a 50 TB database