Re: SET or STRICT modifiers on function affect planner row estimates

From: Michał Kłeczek <michal(at)kleczek(dot)org>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SET or STRICT modifiers on function affect planner row estimates
Date: 2024-09-30 18:36:26
Message-ID: 52BD3E3B-1514-4983-BEC9-1A8742A726DB@kleczek.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Thanks for taking a look.

> On 30 Sep 2024, at 14:14, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> Hi Michal,
> It is difficult to understand the exact problem from your description.
> Can you please provide EXPLAIN outputs showing the expected plan and
> the unexpected plan; plans on the node where the query is run and
> where the partitions are located.

The table structure is as follows:

CREATE TABLE tbl (…) PARTITION BY RANGE year(col02_date)

CREATE TABLE tbl_2015 PARTITION OF tbl FOR VALUES BETWEEN (2023) AND (2024) PARTITION BY HASH (col01_no)
… subsequent years

CREATE TABLE tbl_2021_32_9 PARTITION OF tbl_2021 FOR VALUES WITH (MODULUS 32 REMAINDER 9)

CREATE FOREIGN TABLE tbl_2022_16_9 PARTITION OF tbl_2022 FOR VALUES WITH (MODULUS 32 REMAINDER 9)

All tables are freshly ANALYZEd.

I have a function:

CREATE FUNCTION report(col01 text, year_from, month_from, year_to, month_to) RETURNS … LANGUAGE sql
$$
SELECT
col01_no, year(col02_date), month(col02_date), sum(col03) FROM tbl WHERE col02_no = col02 AND (col02_date conditions) GROUP BY 1, 2, 3
$$

EXPLAIN (…) SELECT * FROM report(…);

gives

Plan 1 (performs pushdown of aggregates):

Append (cost=9.33..76322501.41 rows=3 width=58) (actual time=5.051..6.414 rows=21 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
Output: '2021-01-01'::date
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
Output: '2023-12-31'::date
-> GroupAggregate (cost=9.31..9.85 rows=1 width=58) (actual time=0.073..0.074 rows=0 loops=1)
Output: op.col01_no, (year(op.col02_date)), (month(op.col02_date)), sum(op.col03) FILTER (WHERE (op.debit_flag = '0'::bpchar)), sum(op.col03) FILTER (WHERE (op.debit_flag <> '0'::bpchar)), count(1)
Group Key: (year(op.col02_date)), (month(op.col02_date))
-> Sort (cost=9.31..9.32 rows=1 width=44) (actual time=0.072..0.073 rows=0 loops=1)
Output: (year(op.col02_date)), (month(op.col02_date)), op.col01_no, op.col03, op.debit_flag
Sort Key: (year(op.col02_date)), (month(op.col02_date))
Sort Method: quicksort Memory: 25kB
-> Index Only Scan using tbl_2021_32_9_universal_gist_idx_3a2df25af5bc48a on cbt.tbl_2021_32_9 op (cost=0.28..9.30 rows=1 width=44) (actual time=0.063..0.063 rows=0 loops=1)
Output: year(op.col02_date), month(op.col02_date), op.col01_no, op.col03, op.debit_flag
Index Cond: ((op.col01_no = '22109020660000000110831697'::text) AND (op.col02_date >= $0) AND (op.col02_date <= $1))
Filter: ((year(op.col02_date) >= 2021) AND (year(op.col02_date) <= 2023))
Heap Fetches: 0
-> Async Foreign Scan (cost=100.02..76322480.36 rows=1 width=58) (actual time=0.753..0.755 rows=11 loops=1)
Output: op_1.col01_no, (year(op_1.col02_date)), (month(op_1.col02_date)), (sum(op_1.col03) FILTER (WHERE (op_1.debit_flag = '0'::bpchar))), (sum(op_1.col03) FILTER (WHERE (op_1.debit_flag <> '0'::bpchar))), ((count(1))::double precision)
Relations: Aggregate on (cbt_c61d467d1b5fd1d218d9e6e7dd44a333.tbl_2022_16_9 op_1)
Remote SQL: SELECT col01_no, cbt.year(col02_date), cbt.month(col02_date), sum(col03) FILTER (WHERE (debit_flag = '0')), sum(col03) FILTER (WHERE (debit_flag <> '0')), count(1) FROM cbt.tbl_2022_16_9 WHERE ((cbt.year(col02_date) >= 2021)) AND ((cbt.year(col02_date) <= 2023)) AND ((col02_date >= $1::date)) AND ((col02_date <= $2::date)) AND ((col01_no = '22109020660000000110831697')) GROUP BY 1, 2, 3
-> GroupAggregate (cost=10.63..11.17 rows=1 width=58) (actual time=4.266..4.423 rows=10 loops=1)
Output: op_2.col01_no, (year(op_2.col02_date)), (month(op_2.col02_date)), sum(op_2.col03) FILTER (WHERE (op_2.debit_flag = '0'::bpchar)), sum(op_2.col03) FILTER (WHERE (op_2.debit_flag <> '0'::bpchar)), count(1)
Group Key: (year(op_2.col02_date)), (month(op_2.col02_date))
-> Sort (cost=10.63..10.64 rows=1 width=44) (actual time=4.238..4.273 rows=735 loops=1)
Output: (year(op_2.col02_date)), (month(op_2.col02_date)), op_2.col01_no, op_2.col03, op_2.debit_flag
Sort Key: (year(op_2.col02_date)), (month(op_2.col02_date))
Sort Method: quicksort Memory: 82kB
-> Index Only Scan using tbl_2023_128_9_universal_gist_idx_3a2df25af5bc48a on cbt.tbl_2023_128_9 op_2 (cost=0.54..10.62 rows=1 width=44) (actual time=0.295..4.059 rows=735 loops=1)
Output: year(op_2.col02_date), month(op_2.col02_date), op_2.col01_no, op_2.col03, op_2.debit_flag
Index Cond: ((op_2.col01_no = '22109020660000000110831697'::text) AND (op_2.col02_date >= $0) AND (op_2.col02_date <= $1))
Filter: ((year(op_2.col02_date) >= 2021) AND (year(op_2.col02_date) <= 2023))
Heap Fetches: 0

BUT after I perform
CREATE OR REPLACE report(…)
STRICT TO … AS … (same code)

The plan (as reported by auto_explain) changes to:

Plan 2 (no pushdown):

GroupAggregate (cost=1781983216.68..1781983324.62 rows=200 width=58) (actual time=16.065..16.432 rows=21 loops=1)
Output: op.col01_no, (year(op.col02_date)), (month(op.col02_date)), sum(op.col03) FILTER (WHERE (op.debit_flag = '0'::bpchar)), sum(op.col03) FILTER (WHERE (op.debit_flag <> '0'::bpchar)), count(1)
Group Key: (year(op.col02_date)), (month(op.col02_date))
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Output: make_date($2, $3, 1)
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.02 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=1)
Output: (((make_date($4, $5, 1) + '1 mon'::interval) - '1 day'::interval))::date
-> Sort (cost=1781983216.65..1781983217.33 rows=272 width=44) (actual time=16.041..16.117 rows=1564 loops=1)
Output: (year(op.col02_date)), (month(op.col02_date)), op.col01_no, op.col03, op.debit_flag
Sort Key: (year(op.col02_date)), (month(op.col02_date))
Sort Method: quicksort Memory: 171kB
-> Append (cost=0.55..1781983205.65 rows=272 width=44) (actual time=1.013..15.445 rows=1564 loops=1)
Subplans Removed: 269
-> Index Only Scan using accoper_2021_32_9_universal_gist_idx_3a2df25af5bc48a on cbt.accoper_2021_32_9 op_1 (cost=0.28..9.30 rows=1 width=44) (actual time=0.084..0.084 rows=0 loops=1)
Output: year(op_1.col02_date), month(op_1.col02_date), op_1.col01_no, op_1.col03, op_1.debit_flag
Index Cond: ((op_1.col01_no = $1) AND (op_1.col02_date >= $0) AND (op_1.col02_date <= $1))
Filter: ((year(op_1.col02_date) >= $2) AND (year(op_1.col02_date) <= $4))
Heap Fetches: 0
-> Async Foreign Scan on cbt_c61d467d1b5fd1d218d9e6e7dd44a333.accoper_2022_16_9 op_2 (cost=100.00..76322479.83 rows=1 width=44) (actual time=0.658..3.870 rows=829 loops=1)
Output: year(op_2.col02_date), month(op_2.col02_date), op_2.col01_no, op_2.col03, op_2.debit_flag
Remote SQL: SELECT col03, col02_date, debit_flag, col01_no FROM cbt.accoper_2022_16_9 WHERE ((col02_date >= $1::date)) AND ((col02_date <= $2::date)) AND ((col01_no = $3::text)) AND ((cbt.year(col02_date) >= $4::integer)) AND ((cbt.year(col02_date) <= $5::integer))
-> Index Only Scan using accoper_2023_128_9_universal_gist_idx_3a2df25af5bc48a on cbt.accoper_2023_128_9 op_3 (cost=0.54..10.62 rows=1 width=44) (actual time=0.361..4.043 rows=735 loops=1)
Output: year(op_3.col02_date), month(op_3.col02_date), op_3.col01_no, op_3.col03, op_3.debit_flag
Index Cond: ((op_3.col01_no = $1) AND (op_3.col02_date >= $0) AND (op_3.col02_date <= $1))
Filter: ((year(op_3.col02_date) >= $2) AND (year(op_3.col02_date) <= $4))
Heap Fetches: 0

I understand wrong rows estimates ( =1 ) are due to missing statistics on expressions year() and month().

But why plans are different?

BTW. I tried to add extended statistics on the above expressions but ANALYZE didn’t seem to update any values (as seen in pg_stat_ext) - ndistinct is NULL for example.

Thanks,

--
Michal

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-09-30 18:49:41 Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Previous Message Fujii Masao 2024-09-30 18:33:30 Re: Inconsistency in reporting checkpointer stats