From: | Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu> |
---|---|
To: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Cc: | Manuel Rigger <rigger(at)nus(dot)edu(dot)sg> |
Subject: | Question about double table scans for a table |
Date: | 2023-07-27 08:17:58 |
Message-ID: | SEZPR06MB64949D07C54EA9053425A80E8A01A@SEZPR06MB6494.apcprd06.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi everyone,
Consider the query 11 in the TPC-H benchmark:
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
PARTSUPP,
SUPPLIER,
NATION
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'MOZAMBIQUE'
group by
ps_partkey
having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0001000000
from
PARTSUPP,
SUPPLIER,
NATION
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'MOZAMBIQUE'
)
order by
value desc;
PostgreSQL generates the following query plan:
Sort (cost=1798.52..1799.32 rows=320 width=36)
Sort Key: (sum((partsupp.ps_supplycost * (partsupp.ps_availqty)::numeric))) DESC
InitPlan 1 (returns $0)
-> Aggregate (cost=884.20..884.21 rows=1 width=32)
-> Hash Join (cost=12.40..877.00 rows=960 width=10)
Hash Cond: (partsupp_1.ps_suppkey = supplier_1.s_suppkey)
-> Seq Scan on partsupp partsupp_1 (cost=0.00..765.00 rows=24000 width=14)
-> Hash (cost=12.25..12.25 rows=12 width=4)
-> Hash Join (cost=1.32..12.25 rows=12 width=4)
Hash Cond: (supplier_1.s_nationkey = nation_1.n_nationkey)
-> Seq Scan on supplier supplier_1 (cost=0.00..10.00 rows=300 width=8)
-> Hash (cost=1.31..1.31 rows=1 width=4)
-> Seq Scan on nation nation_1 (cost=0.00..1.31 rows=1 width=4)
Filter: (n_name = 'MOZAMBIQUE'::bpchar)
-> HashAggregate (cost=886.60..901.00 rows=320 width=36)
Group Key: partsupp.ps_partkey
Filter: (sum((partsupp.ps_supplycost * (partsupp.ps_availqty)::numeric)) > $0)
-> Hash Join (cost=12.40..877.00 rows=960 width=14)
Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Seq Scan on partsupp (cost=0.00..765.00 rows=24000 width=18)
-> Hash (cost=12.25..12.25 rows=12 width=4)
-> Hash Join (cost=1.32..12.25 rows=12 width=4)
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier (cost=0.00..10.00 rows=300 width=8)
-> Hash (cost=1.31..1.31 rows=1 width=4)
-> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4)
Filter: (n_name = 'MOZAMBIQUE'::bpchar)
While TiDB has the following query plan:
Projection_63
└─Sort_64
└─Selection_66
└─HashAgg_67
└─Projection_94
└─HashJoin_71
├─HashJoin_84(Build)
│ ├─TableReader_89(Build)
│ │ └─Selection_88
│ │ └─TableFullScan_87
│ └─TableReader_86(Probe)
│ └─TableFullScan_85
└─TableReader_91(Probe)
└─TableFullScan_90
...
Both query plans include different numbers of table scans, as highlighted in red color. PostgreSQL uses six table scans, while TiDB has only three. I understand that the table scanning operation is expensive and query plans are typically more efficient with fewer table scans. My question is why PostgreSQL uses six table scans to scan each table twice? Is it a more efficient query plan, or does this indicate an optimization that is not performed by PostgreSQL?
Best regards,
Jinsheng Ba
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-07-27 14:00:29 | Re: BUG #18038: Aliases removed from view definitions |
Previous Message | PG Bug reporting form | 2023-07-27 06:29:32 | BUG #18038: Aliases removed from view definitions |