Re: Question about double table scans for a table

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Manuel Rigger <rigger(at)nus(dot)edu(dot)sg>
Subject: Re: Question about double table scans for a table
Date: 2023-07-27 20:38:35
Message-ID: CAApHDvpBJprgHT0R6LkENsH78gAhegbf0Zv5XokXpF8LBYNE9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, 27 Jul 2023 at 20:49, Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu> wrote:
> Both query plans include different numbers of table scans, as highlighted in red color. PostgreSQL uses six table scans, while TiDB has only three. I understand that the table scanning operation is expensive and query plans are typically more efficient with fewer table scans. My question is why PostgreSQL uses six table scans to scan each table twice? Is it a more efficient query plan, or does this indicate an optimization that is not performed by PostgreSQL?

The PostgreSQL planner does not do any scan deduplication like this.
You could likely write a query containing a WITH MATERALIZE that runs
the query with the GROUP BY, then reference the CTE in both the main
query and also the HAVING clause. e.g, something like:

explain with cte as materialized (select ps_partkey, sum(ps_supplycost
* ps_availqty) as cost from partsupp ... other joins...) select * from
cte having cost > (select sum(cost) from cte);

How much more efficient that'll be will depend on the number of distinct parts.

David

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-07-27 21:05:31 BUG #18040: PostgreSQL does not report its version correctly
Previous Message Jeff Davis 2023-07-27 18:21:25 search_path not recomputed when role name changes