From: | Sjors Gielen <sjors(at)sjorsgielen(dot)nl> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE |
Date: | 2024-05-31 20:29:02 |
Message-ID: | 41CB6073-8821-4013-9AE9-782F64669502@sjorsgielen.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dear all,
FYI - I have just tried the below on PostgreSQL 17-beta1 and the issue still occurs.
To be clear - this is not a regression. This appears to be a query planner bug (?) which reproduces in Postgres 15, 16 and now 17-beta1.
Best,
Sjors
> Op 26 feb 2024, om 14:11 heeft PG Bug reporting form <noreply(at)postgresql(dot)org> het volgende geschreven:
>
> The following bug has been logged on the website:
>
> Bug reference: 18365
> Logged by: Sjors Gielen
> Email address:
> PostgreSQL version: 16.2
> Operating system: Linux (native amd64 and Docker for Mac arm64)
> Description:
>
> Dear all,
>
> I have run into an issue where a query with a CTE performs a sequential scan
> on a large table (42M rows, ~1min on our prod), while explicitly
> materializing the CTE simply performs an index only scan (~2s). When I set
> `enable_seqscan=off` and compare the costs, it turns out the query planner
> grossly overestimates the cost of a Nested Loop compared to the (as far as I
> can tell) exact same Nested Loop when the CTE is materialized. I know that
> the query planner acts on heuristics, so this might not be considered a bug,
> but the cost values are so wildly different for what should be essentially
> the same operation, that it might warrant further investigation.
>
> I can reproduce the issue on PostgreSQL 15.2, 15.6, 16.2 and 17devel as of
> 20240223.1636.gd360e3c.
>
> I have reported the issue, the schema, the query and the query plan outputs
> at
> <https://dba.stackexchange.com/questions/335570/why-is-postgresql-performing-a-sequential-scan-except-when-my-cte-is-materializ>.
> Also, in order to facilitate reproduction, I have uploaded two reproduction
> scripts and a 1 GB sample of my dataset which still reproduces the issue at
> <https://sjorsgielen.nl/psql-bug-report.tar.gz>. `bring-up.sh` runs a given
> version of the PostgreSQL Docker image listening on local port 15432.
> `run-test.sh` creates the `testing` schema, creates the DDL, and imports the
> data from data.sql.bz2. It should be easy to run `run-test.sh` against any
> Postgres server by changing the DSN at the top of the script. The `testing`
> schema is expected not to exist.
>
> By running `./bring-up.sh 16.2 && ./run-test.sh`, some minutes later, you
> should see the output of three `EXPLAIN (ANALYZE, BUFFERS, VERBOSE)`. I
> would expect the cost of the Nested Loop in the first case (materialized
> view), which is `2246..76657` (around 75k), to be more or less the same as
> that of the Nested Loop in the third case (sequential scan disabled), which
> is 230310..797581 (~570k). Because this cost is much higher, even surpassing
> the Hash Join cost of ~290k, PostgreSQL decides to use a sequential scan in
> the second case (non-materialized view) which ends up causing a slowdown of
> ~6.5 times.
>
> Is this a bug?
>
> Thank you,
> Sjors Gielen
>
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2024-06-01 12:00:01 | BUG #18490: Assert in comparetup_index_btree_tiebreak() fails when pg_class reindexed during a table creation |
Previous Message | Abel Abraham Camarillo Ojeda | 2024-05-31 19:16:33 | Re: BUG #18489: CONSUMO CPU |