using sequential scan instead of index for join with a union

From: Kent Tong <kent(dot)tong(dot)mo(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: using sequential scan instead of index for join with a union
Date: 2019-05-17 09:15:19
Message-ID: CAKs98dGxcPAyiN7Euw8wjiadVkNj92L+_txgpdVdEfRBDefk-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I am joining the union of three tables with another table. Postgresql uses
the index when only two tables are in the union. If I add one more table to
the union, it switches to seq scan. Apparently it also uses the index when
only one table is joined.

The SQL is:
select * from (
SELECT 'NEWS' datatype, n.id, mbct_id FROM news n
union all
SELECT 'SPEECH' datatype, s.id, mbct_id FROM speech s
union all
SELECT 'NOTICE' datatype, notice.id, mbct_id FROM notice
) x join NBSMultiBroadcast y on x.mbct_id=y.id where y.zhtw_grp_bct between
'2019-05-10' and '2019-05-17';

The estimated number of rows is not off against the actual number of rows,
which is around 120. So, I don't really understand why PostgreSQL seems to
believe it should use Seq Scan due to a relatively large number of rows are
expected.

I am using v11.3:

PostgreSQL 11.3 (Ubuntu 11.3-1.pgdg16.04+1) on i686-pc-linux-gnu, compiled
by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 32-bit

The output of explain analyze is:
Hash Join (cost=1153.01..6273.58 rows=134 width=1856) (actual
time=46.937..50.557 rows=120 loops=1)
Hash Cond: (n.mbct_id = y.id)
-> Append (cost=0.00..5043.33 rows=29422 width=48) (actual
time=0.015..42.237 rows=29422 loops=1)
-> Seq Scan on news n (cost=0.00..4588.30 rows=27430 width=48)
(actual time=0.015..35.902 rows=27430 loops=1)
-> Seq Scan on speech s (cost=0.00..26.26 rows=226 width=48)
(actual time=0.009..0.182 rows=226 loops=1)
-> Seq Scan on notice (cost=0.00..281.66 rows=1766 width=48)
(actual time=0.005..1.283 rows=1766 loops=1)
-> Hash (cost=1151.24..1151.24 rows=142 width=1808) (actual
time=2.466..2.466 rows=130 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
-> Index Scan using zhtw_grp_bct on nbsmultibroadcast y
(cost=0.29..1151.24 rows=142 width=1808) (actual time=2.279..2.396 rows=130
loops=1)
Index Cond: ((zhtw_grp_bct >= '2019-05-10
00:00:00'::timestamp without time zone) AND (zhtw_grp_bct <= '2019-05-17
00:00:00'::timestamp without time zone))
Planning Time: 0.749 ms
Execution Time: 50.637 ms

The output of explain analyze for just two tables in the union is:
Nested Loop (cost=0.57..5863.96 rows=126 width=1856) (actual
time=2.199..21.513 rows=103 loops=1)
-> Index Scan using zhtw_grp_bct on nbsmultibroadcast y
(cost=0.29..1151.24 rows=142 width=1808) (actual time=2.172..2.313 rows=130
loops=1)
Index Cond: ((zhtw_grp_bct >= '2019-05-10 00:00:00'::timestamp
without time zone) AND (zhtw_grp_bct <= '2019-05-17 00:00:00'::timestamp
without time zone))
-> Append (cost=0.29..33.17 rows=2 width=48) (actual time=0.035..0.146
rows=1 loops=130)
-> Index Scan using news_mbct_id_idx on news n (cost=0.29..6.33
rows=1 width=48) (actual time=0.004..0.005 rows=1 loops=130)
Index Cond: (mbct_id = y.id)
-> Seq Scan on speech s (cost=0.00..26.82 rows=1 width=48)
(actual time=0.139..0.139 rows=0 loops=130)
Filter: (y.id = mbct_id)
Rows Removed by Filter: 226
Planning Time: 0.639 ms
Execution Time: 21.604 m

The size of the three tables are 27430, 226 and 1766 respectively.

Many thanks for any help!
--
Kent Tong
IT author and consultant, child education coach

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sergei Kornilov 2019-05-17 09:23:19 Re: using sequential scan instead of index for join with a union
Previous Message Naik, Sameer 2019-05-17 06:42:23 RE: Re: Re: Generic Plans for Prepared Statement are 158155 times slower than Custom Plans