Re: incremental sort vs. gather paths

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dave Page <dpage(at)pgadmin(dot)org>
Subject: Re: incremental sort vs. gather paths
Date: 2021-12-16 17:16:49
Message-ID: 2d00f31d-08e7-05b8-ccf3-b0645943d80a@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/16/21 17:48, Robert Haas wrote:
> This code introduced in ba3e76cc571eba3dea19c9465ff15ac3ac186576 looks
> wrong to me:
>
> total_groups = cheapest_partial_path->rows *
> cheapest_partial_path->parallel_workers;
> path = (Path *)
> create_gather_merge_path(root, ordered_rel,
> path,
> path->pathtarget,
> root->sort_pathkeys, NULL,
> &total_groups);
>
> This too:
>
> total_groups = input_path->rows *
> input_path->parallel_workers;
>
> This came to my attention because Dave Page sent me a query plan that
> looks like this:
>
> Gather Merge (cost=22617.94..22703.35 rows=732 width=97) (actual
> time=2561.476..2561.856 rows=879 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Sort (cost=21617.92..21618.83 rows=366 width=97) (actual
> time=928.329..928.378 rows=293 loops=3)
> Sort Key: aid
> Sort Method: quicksort Memory: 155kB
> Worker 0: Sort Method: quicksort Memory: 25kB
> Worker 1: Sort Method: quicksort Memory: 25kB
> -> Parallel Seq Scan on accounts_1m (cost=0.00..21602.33
> rows=366 width=97) (actual time=74.391..74.518 rows=293 loops=3)
> Filter: (aid < 10000000)
> Rows Removed by Filter: 333040
>
> If you look at the actual row count estimates, you see that the Gather
> Merge produces 3 times the number of rows that the Parallel Seq Scan
> produces, which is completely correct, because the raw number is 897
> in both cases, but EXPLAIN unhelpfully divides the displayed row count
> by the loop count, which in this case is 3. If you look at the
> estimated row count, you see that the Gather Merge is estimated to
> produce exactly 2 times the number of rows that the nodes under it
> would produce. That's not a very good estimate, unless
> parallel_leader_participation=off, which in this case it isn't.
>
> What's happening here is that the actual number of rows produced by
> accounts_1m is actually 879 and is estimated as 879.
> get_parallel_divisor() decides to divide that number by 2.4, and gets
> 366, because it thinks the leader will do less work than the other
> workers. Then the code above fires and, instead of letting the
> original row count estimate for accounts_1m apply to the Gather Merge
> path, it overrides it with 2 * 366 = 732. This cannot be right.
> Really, I don't think it should be overriding the row count estimate
> at all. But if it is, multiplying by the number of workers can't be
> right, because the leader can also do stuff.
>

Maybe, but other places (predating incremental sort) creating Gather
Merge do the same thing, and commit ba3e76cc57 merely copied this. For
example generate_gather_paths() does this:

foreach(lc, rel->partial_pathlist)
{
Path *subpath = (Path *) lfirst(lc);
GatherMergePath *path;

if (subpath->pathkeys == NIL)
continue;

rows = subpath->rows * subpath->parallel_workers;
path = create_gather_merge_path(root, rel, subpath,
rel->reltarget,
subpath->pathkeys, NULL, rowsp);
add_path(rel, &path->path);
}

i.e. it's doing the same (rows * parallel_workers) calculation.

It may not not be the right way to estimate this, of course. But I'd say
if ba3e76cc57 is doing it wrong, so are the older places.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-12-16 17:17:18 Re: Buildfarm support for older versions
Previous Message Tomas Vondra 2021-12-16 17:00:56 Re: Use generation context to speed up tuplesorts