From: | Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Possible incorrect row estimation for Gather paths |
Date: | 2024-07-22 08:47:24 |
Message-ID: | CAO6_XqpD-WJE7gH6K3JnqTwsq0UkAMph_Gx-DCicMsL-WFdrdg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 17, 2024 at 3:59 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
> I can reproduce this problem with the query below.
>
> explain (costs on) select * from tenk1 order by twenty;
> QUERY PLAN
> ---------------------------------------------------------------------------------
> Gather Merge (cost=772.11..830.93 rows=5882 width=244)
> Workers Planned: 1
> -> Sort (cost=772.10..786.80 rows=5882 width=244)
> Sort Key: twenty
> -> Parallel Seq Scan on tenk1 (cost=0.00..403.82 rows=5882 width=244)
> (5 rows)
I was looking for a test to add in the regress checks that wouldn't
rely on explain cost since it is disabled. However, I've realised I
could do something similar to what's done in stats_ext with the
check_estimated_rows function. I've added the get_estimated_rows test
function that extracts the estimated rows from the top node and uses
it to check the gather nodes' estimates. get_estimated_rows uses a
simple explain compared to check_estimated_rows which relies on an
explain analyze.
> I wonder if the changes in create_ordered_paths should also be reduced
> to 'total_groups = gather_rows_estimate(path);'.
It should already be the case with the patch v2. It does create
rounding errors that are visible in the tests but they shouldn't
exceed +1 or -1.
> I think perhaps it's better to declare gather_rows_estimate() in
> cost.h rather than optimizer.h.
> (BTW, I wonder if compute_gather_rows() would be a better name?)
Good point, I've moved the declaration and renamed it.
> I noticed another issue in generate_useful_gather_paths() -- *rowsp
> would have a random value if override_rows is true and we use
> incremental sort for gather merge. I think we should fix this too.
That seems to be the case. I've tried to find a query that could
trigger this codepath without success. All grouping and distinct paths
I've tried where fully sorted and didn't trigger an incremental sort.
I will need a bit more time to check this.
In the meantime, I've updated the patches with the review comments.
Regards,
Anthonin
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Remove-unnecessary-assignment-of-path-rows-in-gat.patch | application/octet-stream | 1.2 KB |
v3-0002-Fix-row-estimation-in-gather-paths.patch | application/octet-stream | 10.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2024-07-22 08:55:30 | Re: proposal: schema variables |
Previous Message | Amit Langote | 2024-07-22 08:46:18 | Re: pgsql: Add more SQL/JSON constructor functions |