Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: postgresql(at)sjorsgielen(dot)nl, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
Date: 2024-06-03 23:07:20
Message-ID: CAApHDvqBTtdtE_w__HeOvzen6UqKUVzcNpTm74_ioJOfdOJ50A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 27 Feb 2024 at 02:44, PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> I have run into an issue where a query with a CTE performs a sequential scan
> on a large table (42M rows, ~1min on our prod), while explicitly
> materializing the CTE simply performs an index only scan (~2s). When I set
> `enable_seqscan=off` and compare the costs, it turns out the query planner
> grossly overestimates the cost of a Nested Loop compared to the (as far as I
> can tell) exact same Nested Loop when the CTE is materialized. I know that
> the query planner acts on heuristics, so this might not be considered a bug,
> but the cost values are so wildly different for what should be essentially
> the same operation, that it might warrant further investigation.

> Is this a bug?

I don't see any bugs. What seems to be going on is that the
materialized CTE underestimates the number of rows after making the
CTE distinct so the planner can use a join rather than a semi-join.

That's seen in:

-> HashAggregate (cost=2576.28..2679.33 rows=10305 width=16) (actual
time=765.793..893.761 rows=166060 loops=1)

Due to that row underestimation, the planner thinks a parameterized
nested loop is the best way to join the two relations as it thinks the
stock_history_date_product_id_idx index only needs to be looked up
10305 times rather than 166060 times. With the non-materialized
version, the planner estimates 103051. That causes it to think the
Nested Loop -> index scan on stock_history_date_product_id_idx is too
expensive and hash join with a seq scan is cheaper As it turns out,
looking up the index *is* faster, even when that's done 166060 times.

The two parameters that drive the planner's decision on this are
random_page_cost, (you might want to consider lowering that) and
effective_cache_size. A rough guideline for effective_cache_size is
75% of the machine's RAM. However, it much depends on your
shared_buffer setting and what type of other things run concurrently
on the machine. Some people have found lowering random_page_cost as
low as 1.1 helps. The default is 4.0, which has remained since the HDD
days. For SSDs, it's often too large.

I've attached the EXPLAINs I trimmed down and compared to reach this conclusion.

David

Attachment Content-Type Size
bug_18365_explain.txt text/plain 3.9 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Haifang Wang (Centific Technologies Inc) 2024-06-04 00:09:17 RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Previous Message Tom Lane 2024-06-03 22:54:51 Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607