Quick Links

Re: Adjust tuples estimate for appendrels

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tender Wang <tndrwang(at)gmail(dot)com>
Subject:	Re: Adjust tuples estimate for appendrels
Date:	2025-01-27 17:52:34
Message-ID:	cd6ef6c8-c99d-4772-bc27-7cbd3b4fd9c6@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi!

On 27.01.2025 11:05, Richard Guo wrote:
> (This was briefly discussed in [1], which primarily focuses on the
> incremental sort regression. So start a new thread for this topic.)
>
> In set_append_rel_size(), we currently set rel->tuples to rel->rows
> for an appendrel. Generally, rel->tuples is the raw number of tuples
> in the relation and rel->rows is the estimated number of tuples after
> the relation's restriction clauses have been applied. Although an
> appendrel itself doesn't directly enforce any quals today, its child
> relations may. Therefore, setting rel->tuples equal to rel->rows for
> an appendrel isn't always appropriate.
>
> AFAICS, doing so can lead to issues in cost estimates. For instance,
> when estimating the number of distinct values from an appendrel, we
> would not be able to adjust the estimate based on the restriction
> selectivity (see estimate_num_groups()).
>
> Attached is a patch that sets an appendrel's tuples to the total
> number of tuples accumulated from each live child, which I believe
> aligns better with reality.
>
> Here's a simple example that demonstrates how this change improves
> cost estimates in certain cases.
>
> create table p (a int, b int, c float) partition by range(a);
> create table p1 partition of p for values from (0) to (1000);
> create table p2 partition of p for values from (1000) to (2000);
>
> insert into p select i%2000, random(1, 100000), random(1, 100000) from
> generate_series(1, 1000000)i;
>
> analyze p;
>
> explain analyze select b from p where c < 10000 group by b;
>
> -- without this patch
> HashAggregate (cost=18651.38..19568.54 rows=91716 width=4)
> (actual time=467.859..487.227 rows=63346 loops=1)
>
> -- with this patch
> HashAggregate (cost=18651.38..19275.60 rows=62422 width=4)
> (actual time=447.383..466.351 rows=63346 loops=1)
I looked at it and agree with your solution.
>
> Unfortunately, I failed to come up with a stable test case that shows
> a plan diff with this change. So the attached patch does not include
> a test case for now.

I created a stable test:

create table p (a int, b int, c float) partition by range(a);
create table p1 partition of p for values from (0) to (100);
create table p2 partition of p for values from (100) to (1000);
insert into p select i%200, i%300, i%400from
generate_series(1, 1000)i;
analyze p;
SELECT * FROM check_estimated_rows('select b from p where c < 10 group
by b');
estimated | actual
-----------+--------
27| 29
(1row)
drop table p;
I added it in the diff file.

--
Regards,
Alena Rybakina
Postgres Professional

Attachment	Content-Type	Size
test.diff	text/x-patch	1.7 KB

In response to

Adjust tuples estimate for appendrels at 2025-01-27 08:05:50 from Richard Guo

Responses

Re: Adjust tuples estimate for appendrels at 2025-02-10 08:22:22 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2025-01-27 19:39:46	Re: Adding extension default version to \dx
Previous Message	Masahiko Sawada	2025-01-27 17:52:16	Re: Eagerly scan all-visible pages to amortize aggressive vacuum