Re: Possible incorrect row estimation for Gather paths

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible incorrect row estimation for Gather paths
Date: 2024-07-17 01:59:05
Message-ID: CAMbWs4-ZkH9t40LH8LMyZUuqbBww1k9OD+CH+O_7LJ7TwP3Zhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I can reproduce this problem with the query below.

explain (costs on) select * from tenk1 order by twenty;
QUERY PLAN
---------------------------------------------------------------------------------
Gather Merge (cost=772.11..830.93 rows=5882 width=244)
Workers Planned: 1
-> Sort (cost=772.10..786.80 rows=5882 width=244)
Sort Key: twenty
-> Parallel Seq Scan on tenk1 (cost=0.00..403.82 rows=5882 width=244)
(5 rows)

On Tue, Jul 16, 2024 at 3:56 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
> The initial goal was to use the source tuples if available and avoid
> possible rounding errors. Though I realise that the difference would
> be minimal. For example, 200K tuples and 3 workers would yield
> int(int(200000 / 2.4) * 2.4)=199999. That is probably not worth the
> additional complexity, I've updated the patch to just use
> gather_rows_estimate.

I wonder if the changes in create_ordered_paths should also be reduced
to 'total_groups = gather_rows_estimate(path);'.

> I've also realised from the comments in optimizer.h that
> nodes/pathnodes.h should not be included there and fixed it.

I think perhaps it's better to declare gather_rows_estimate() in
cost.h rather than optimizer.h.
(BTW, I wonder if compute_gather_rows() would be a better name?)

I noticed another issue in generate_useful_gather_paths() -- *rowsp
would have a random value if override_rows is true and we use
incremental sort for gather merge. I think we should fix this too.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-07-17 02:19:41 Re: Injection points: preloading and runtime arguments
Previous Message Joseph Koshakow 2024-07-17 01:23:27 Re: Remove dependence on integer wrapping