From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: planner chooses incremental but not the best one |
Date: | 2023-12-15 08:58:22 |
Message-ID: | CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
>
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
>
I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached? I believe this aligns more closely with reality.
And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3. As I experimented, this can improve
the estimate for appendrels. For instance,
create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);
insert into t select i%2000, (100000 * random())::int, random() from
generate_series(1,1000000) i;
analyze t;
explain analyze select b from t where c < 0.1 group by b;
-- on master
HashAggregate (cost=18659.28..19598.74 rows=93946 width=4)
(actual time=220.760..234.439 rows=63224 loops=1)
-- on patched
HashAggregate (cost=18659.28..19294.25 rows=63497 width=4)
(actual time=235.161..250.023 rows=63224 loops=1)
With the patch the estimate for the number of distinct 'b' values is
more accurate.
BTW, this patch does not change any existing regression test results. I
attempted to devise a regression test that shows how this change can
improve query plans, but failed. Should I try harder to find such a
test case?
Thanks
Richard
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Adjust-tuples-estimate-for-appendrel.patch | application/octet-stream | 1.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Borisov | 2023-12-15 09:13:33 | Re: Add 64-bit XIDs into PostgreSQL 15 |
Previous Message | jian he | 2023-12-15 08:36:41 | Re: remaining sql/json patches |