Quick Links

Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Sjors Gielen <sjors(at)sjorsgielen(dot)nl>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
Date:	2024-06-03 22:37:04
Message-ID:	CAApHDvq78Xdjvq-9tAawNx74pUpDLNJHSH=riK3-gv5zv2ww_Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Tue, 4 Jun 2024 at 02:50, Sjors Gielen <sjors(at)sjorsgielen(dot)nl> wrote:
> I am not sure I am using the correct list for this - please let me know if I
> should move this to pgsql-hackers or pgsql-performance.

This is the correct place to report bugs.

> Should the correct plan not be simply this?
>
> -> Nested Loop
> -> CTE Scan on last_semester_per_student (cost=0.00..440.92 rows=22046 width=8) (actual time=9.947..13.157 rows=21826 loops=1)
> -> Index Only Scan using idx2 on public.student_courses (cost=0.29..3.54 rows=6 width=8) (actual time=0.001..0.001 rows=6 loops=21826)

You're right that the query planner injected a uniquification step
into the plan unnecessarily. It did this to allow the semi-join to be
transformed into a join so it could swap the join order.

We won't class this issue as a bug as there is no code in PostgreSQL
that is malfunctioning. We simply don't have anything in place in the
current versions of PostgreSQL to detect which keys a Path is unique
on which the planner could use to skip performing the uniquification
step.

> I am not certain, but it appears that the code generating this path might be in
> pathnode.c create_unique_path, which already checks whether its child node
> yields unique rows, but only supports relations or subqueries for this check,
> not CTEs. Therefore, it falls back to a naive estimation which is off by a
> factor of 100x.

The row estimation part has been improved in the yet-to-be-released
PostgreSQL 17. The improvement won't be backpatched into earlier
versions.

> Am I correct in thinking this code would be improved by supporting CTE
> subqueries, as well, transporting the uniqueness information from the
> CTE-producing node (in this case, the outer HashAggregate)?

I'm unsure what you mean by the first part since CTEs already support
that. For the "transporting the uniqueness information" part, yes, if
the planner tracked that it could produce a plan that could be
executed more efficiently.

> Would a change of this nature be accepted upstream, if I were to put some time
> and effort into it?

There has been some effort in [1] to improve this. I've not had enough
spare time to look into how that's progressing. If you're keen to see
PostgreSQL produce better plans for this type of thing, you're very
welcome to catch up with what's been done so far and assess how you
can best assist with progressing this further. Showing interest on the
thread and testing the latest patch is likely a very good start.

David

[1] https://postgr.es/m/7mlamswjp81p.fsf@e18c07352.et15sqa

In response to

Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE at 2024-06-02 21:36:06 from Sjors Gielen

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Michael Paquier	2024-06-03 22:40:11	Re: BUG #18483: Segmentation fault in tests modules
Previous Message	Thomas Munro	2024-06-03 21:22:22	Re: [EXTERNAL] Re: Windows Application Issues \| PostgreSQL \| REF # 48475607