Quick Links

Top-N sorts in EXPLAIN, row count estimates, and parallelism

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Top-N sorts in EXPLAIN, row count estimates, and parallelism
Date:	2019-05-23 19:22:48
Message-ID:	20190523192248.lqapbtgsnyerbbop@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Right now we don't indicate that a top-n sort is going to be used in
EXPLAIN, just EXPLAIN ANALYZE. That's imo suboptimal, because one quite
legitimately might want to know that before actually executing (as it
will make a huge amount of difference in the actual resource intensity
of the query).

postgres[28165][1]=# EXPLAIN (VERBOSE) SELECT * FROM hashes ORDER BY count DESC LIMIT 10;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=12419057.53..12419058.70 rows=10 width=45) │
│ Output: hash, count │
│ -> Gather Merge (cost=12419057.53..66041805.65 rows=459591466 width=45) │
│ Output: hash, count │
│ Workers Planned: 2 │
│ -> Sort (cost=12418057.51..12992546.84 rows=229795733 width=45) │
│ Output: hash, count │
│ Sort Key: hashes.count DESC │
│ -> Parallel Seq Scan on public.hashes (cost=0.00..7452254.33 rows=229795733 width=45) │
│ Output: hash, count │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

postgres[28165][1]=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM hashes ORDER BY count DESC LIMIT 10;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=12419057.53..12419058.70 rows=10 width=45) (actual time=115204.278..115205.024 rows=10 loops=1) │
│ Output: hash, count │
│ -> Gather Merge (cost=12419057.53..66041805.65 rows=459591466 width=45) (actual time=115204.276..115205.020 rows=10 loops=1) │
│ Output: hash, count │
│ Workers Planned: 2 │
│ Workers Launched: 2 │
│ -> Sort (cost=12418057.51..12992546.84 rows=229795733 width=45) (actual time=115192.189..115192.189 rows=7 loops=3) │
│ Output: hash, count │
│ Sort Key: hashes.count DESC │
│ Sort Method: top-N heapsort Memory: 25kB │
│ Worker 0: Sort Method: top-N heapsort Memory: 25kB │
│ Worker 1: Sort Method: top-N heapsort Memory: 25kB │
│ Worker 0: actual time=115186.558..115186.559 rows=10 loops=1 │
│ Worker 1: actual time=115186.540..115186.540 rows=10 loops=1 │
│ -> Parallel Seq Scan on public.hashes (cost=0.00..7452254.33 rows=229795733 width=45) (actual time=0.080..90442.215 rows=183836589 loops=3) │
│ Output: hash, count │
│ Worker 0: actual time=0.111..90366.999 rows=183976442 loops=1 │
│ Worker 1: actual time=0.107..90461.921 rows=184707680 loops=1 │
│ Planning Time: 0.121 ms │
│ Execution Time: 115205.053 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(20 rows)

It's also noticable that we preposterously assume that the sort actually
will return exactly the number of rows in the table, despite being a
top-n style sort. That seems bad for costing of the parallel query,
because it think we'll assume that costs tups * parallel_tuple_cost?

I'm also unclear as to why the Gather Merge ends up with twice as
many estimated rows as there are in the table.

Greetings,

Andres Freund

Responses

Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism at 2019-05-23 22:31:43 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Donald Dong	2019-05-23 20:15:02	Re: Why could GEQO produce plans with lower costs than the standard_join_search?
Previous Message	Andres Freund	2019-05-23 18:49:08	Re: FullTransactionId changes are causing portability issues