Strange sort node/explain result

From: "Gunnar \"Nick\" Bluth" <gunnar(dot)bluth(at)pro-open(dot)de>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: corrado(at)ficicchia(dot)net
Subject: Strange sort node/explain result
Date: 2022-10-31 14:20:15
Message-ID: 587a301e-478d-1cbc-288f-6ea77086b097@pro-open.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello all!

Last week at pgconf.eu, my good friend Corrado aka. Dino (in CC) showed
me a plan he couldn't make sense of.
I stood there, scratching my head and couldn't make sense of it just the
same.
What comforted me somehow is that nobody else we showed it to, including
Peter Geoghegan (but to be fair, it was on day 3 of the conference ;-)
could explain what's going on there.
So we follow Peter's advise and post this here.

Background: Dino played a bit in a test DB, wondering if putting all the
extract() values into a CTE when querying the large timeseries table
would save time compared to generating them on the fly or storing them
in the timeseries table (as it happens right now).

Mind you, this inquiry is not about if that's a good idea, the query's
performance or even the planner's choice of a merge join!

What puzzles us is the part where the CTE "oneyear" somehow explodes
into a sort node of almost 10 mio (but not the same amount as the index
scan emits!) rows, taking ~ 0.4 seconds but only using 4x-5x kB of memory:

-> Sort (cost=69.83..72.33 rows=1000 width=4) (actual
time=0.418..448.397 rows=9855001 loops=1)
Output: y.d
Sort Key: y.d
Sort Method: quicksort Memory: 42kB
-> CTE Scan on oneyear y (cost=0.00..20.00 rows=1000 width=4)
(actual time=0.079..0.383 rows=366 loops=1)
Output: y.d

Full plan is here: https://explain.depesz.com/s/B3mN

Testcase to reproduce is attached. It uses roughly the same amount of
rows as the original data.

I tried this on 9.6, 10, 11, 12, 14 & 15 and it's always showing the
same effect.

We also tried to "broaden" everything (adding more columns to the CTE,
fetching more or less from the main table, etc.) in Berlin, but the
outcome is rock solid, so to speak. Even the memory footprint of the
sort node doesn't change at all.

Dropping the index though leads to a completely different plan, and
_that_ sort node is sane (but not fed by a CTE scan, obviously), using
the same 42kB of memory:

-> Sort (cost=59.84..62.34 rows=1000 width=8) (actual
time=57.275..57.355 rows=366 loops=1)

Output: y.d, ((y.d)::date)

Sort Key: ((y.d)::date)

Sort Method: quicksort Memory: 42kB

Buffers: shared hit=4

-> Function Scan on pg_catalog.generate_series y
(cost=0.01..10.01 rows=1000 width=8) (actual time=57.190..57.233
rows=366 loops=1)
Output: y.d, (y.d)::date

Function Call: generate_series((now() - '365
days'::interval), now(), '1 day'::interval)

Maybe the sort node gets attributed wrongly and it's actually the index
scan result that gets sorted? But that wouldn't be necessary, afterall...
Maybe the sort really takes much more memory, but the executor doesn't
report that correctly?
Why does the CTE sort node sort exactly 26.999 rows less than the main
table has rows, when that sports exactly 27.000 rows per day?
Maybe we may have found a long missing optimization opportunity? ;-)

We are seriously clueless; pls. enlighten us! ;-)

Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar(dot)bluth(at)pro-open(dot)de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachment Content-Type Size
testcase.sql application/sql 754 bytes

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2022-10-31 20:40:12 Re: Strange sort node/explain result
Previous Message Lauri Laanmets 2022-10-31 14:16:18 Re: Memory leak on subquery as scalar operand