Quick Links

Re: planner chooses incremental but not the best one

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: planner chooses incremental but not the best one
Date:	2023-12-15 08:58:22
Message-ID:	CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
>
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
>

I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached? I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3. As I experimented, this can improve
the estimate for appendrels. For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (100000 * random())::int, random() from
generate_series(1,1000000) i;
analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
HashAggregate (cost=18659.28..19598.74 rows=93946 width=4)
(actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
HashAggregate (cost=18659.28..19294.25 rows=63497 width=4)
(actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results. I
attempted to devise a regression test that shows how this change can
improve query plans, but failed. Should I try harder to find such a
test case?

Thanks
Richard

Attachment	Content-Type	Size
v1-0001-Adjust-tuples-estimate-for-appendrel.patch	application/octet-stream	1.9 KB

In response to

Re: planner chooses incremental but not the best one at 2023-12-14 10:02:08 from Richard Guo

Responses

Re: planner chooses incremental but not the best one at 2023-12-17 22:18:37 from Tomas Vondra
Re: planner chooses incremental but not the best one at 2023-12-22 09:17:45 from Sébastien Lardière
Re: planner chooses incremental but not the best one at 2024-02-15 07:35:57 from Andrei Lepikhov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Borisov	2023-12-15 09:13:33	Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message	jian he	2023-12-15 08:36:41	Re: remaining sql/json patches