Re: Possible incorrect row estimation for Gather paths

From: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible incorrect row estimation for Gather paths
Date: 2024-07-10 14:54:22
Message-ID: CA+FpmFfaCbKy1D9r2cr90oKXRwDiat4mKWFVnT5b82xm8xidRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Anthonin,

I spent some time on this problem and your proposed solution.
As I understand it, this is the correction for the row count when the
number of parallel workers < 4.
Once the number of workers is 4 or more, the output from
parallel_divisor is the same as the number of parallel workers.
And then the number of rows for such cases would be the same with or
without the proposed patch.
So that way I think it is good to fix this case for a smaller number of workers.

But I don't quite understood the purpose of this,
+ total_groups = input_rel->rows;
+
+ /*
+ * If the number of rows is unknown, fallback to gather rows
+ * estimation
+ */
+ if (total_groups == 0)
+ total_groups = gather_rows_estimate(input_path);

why not just use total_groups = gather_rows_estimate(input_path), what
is the importance of having total_groups = input_rel->rows?

With respect to the change introduced by the patch in the regression
test, I wonder if we should test it on the tables of a larger scale
and check for performance issues.
Imagine the case when Parallel Seq Scan on extremely_skewed s
(cost=0.00..167.01 rows=1 width=4) returns 1000 rows instead of 1 ...
I wonder which plan would perform better then or will there be a
totally different plan.

However, my hunch is that there should not be serious problems,
because before this patch the number of estimated rows was incorrect
anyway.

I don't see a problem in merging the two patches.

On Fri, 24 May 2024 at 11:35, Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
>
> Hi,
>
> While experimenting on an explain option to display all plan candidates (very rough prototype here [1]), I've noticed some discrepancies in some generated plans.
>
> EXPLAIN (ALL_CANDIDATES) SELECT * FROM pgbench_accounts order by aid;
> Plan 1
> -> Gather Merge (cost=11108.32..22505.38 rows=100000 width=97)
> Workers Planned: 1
> -> Sort (cost=10108.31..10255.37 rows=58824 width=97)
> Sort Key: aid
> -> Parallel Seq Scan on pgbench_accounts (cost=0.00..2228.24 rows=58824 width=97)
> Plan 2
> -> Gather Merge (cost=11108.32..17873.08 rows=58824 width=97)
> Workers Planned: 1
> -> Sort (cost=10108.31..10255.37 rows=58824 width=97)
> Sort Key: aid
> -> Parallel Seq Scan on pgbench_accounts (cost=0.00..2228.24 rows=58824 width=97)
>
> The 2 plans are similar except one Gather Merge has a lower 58K estimated rows.
>
> The first plan is created with generate_useful_gather_paths with override_rows=false then create_gather_merge_path and will use rel->rows as the row count (so the 100K rows of pgbench_accounts).
> #0: create_gather_merge_path(...) at pathnode.c:1885:30
> #1: generate_useful_gather_paths(... override_rows=false) at allpaths.c:3286:11
> #2: apply_scanjoin_target_to_paths(...) at planner.c:7744:3
> #3: grouping_planner(...) at planner.c:1638:3
>
> The second plan is created through create_ordered_paths then create_gather_merge_path and the number of rows is estimated to (worker_rows * parallel_workers). Since we only have 1 parallel worker, this yields 58K rows.
> #0: create_gather_merge_path(...) at pathnode.c:1885:30
> #1: create_ordered_paths(...) at planner.c:5287:5
> #2: grouping_planner(...) at planner.c:1717:17
>
> The 58K row estimate looks possibly incorrect. A worker row count is estimated using total_rows/parallel_divisor. The parallel_divisor will include the possible leader participation and will be 1.7 for 1 worker thus the 58K rows (100K/1.7=58K)
> However, the gather merge will only do 58K*1, dropping the leader participation component.
>
> I have a tentative patch split in two changes:
> 1: This is a related refactoring to remove an unnecessary and confusing assignment of rows in create_gather_merge_path. This value is never used and immediately overwritten in cost_gather_merge
> 2: This changes the row estimation of gather path to use worker_rows*parallel_divisor to get a more accurate estimation. Additionally, when creating a gather merge path in create_ordered_paths, we try to use the source's rel rows when available.
>
> The patch triggered a small change in the hash_join regression test. Pre patch, we had the following candidates.
> Plan 4
> -> Aggregate (cost=511.01..511.02 rows=1 width=8)
> -> Gather (cost=167.02..511.00 rows=2 width=0)
> Workers Planned: 1
> -> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
> Hash Cond: (r.id = s.id)
> -> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
> -> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
> -> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
> Plan 5
> -> Finalize Aggregate (cost=510.92..510.93 rows=1 width=8)
> -> Gather (cost=510.80..510.91 rows=1 width=8)
> Workers Planned: 1
> -> Partial Aggregate (cost=510.80..510.81 rows=1 width=8)
> -> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
> Hash Cond: (r.id = s.id)
> -> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
> -> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
> -> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
>
> Post patch, the plan candidates became:
> Plan 4
> -> Finalize Aggregate (cost=511.02..511.03 rows=1 width=8)
> -> Gather (cost=510.80..511.01 rows=2 width=8)
> Workers Planned: 1
> -> Partial Aggregate (cost=510.80..510.81 rows=1 width=8)
> -> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
> Hash Cond: (r.id = s.id)
> -> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
> -> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
> -> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
> Plan 5
> -> Aggregate (cost=511.01..511.02 rows=1 width=8)
> -> Gather (cost=167.02..511.00 rows=2 width=0)
> Workers Planned: 1
> -> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
> Hash Cond: (r.id = s.id)
> -> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
> -> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
> -> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
>
> The FinalizeAggregate plan has an increased cost of 1 post patch due to the number of rows in the Gather node that went from 1 to 2 (rint(1 * 1.7)=2). This was enough to make the Agggregate plan cheaper. The test is to check the parallel hash join so updating it with the new cheapest plan looked fine.
>
> Regards,
> Anthonin
>
> [1]: https://github.com/bonnefoa/postgres/tree/plan-candidates

--
Regards,
Rafia Sabih

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-07-10 14:54:27 Re: jsonpath: Inconsistency of timestamp_tz() Output
Previous Message Dave Cramer 2024-07-10 14:50:45 Is it possible to create a cursor with hold using extended query protocol