Re: Unexpected Performance for the Function simplify_function

From: Shiv Iyer <shiv(at)minervadb(dot)com>
To: Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Unexpected Performance for the Function simplify_function
Date: 2024-10-24 19:48:57
Message-ID: CAALLqt99Bw+tBER8naQKf=__0yFm+b5MeXV2hjr45_g8oO+aig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

The query plans and results you shared illustrate the unexpected
performance differences between using and bypassing the simplify_function()
logic in PostgreSQL’s optimizer. Here’s an in-depth analysis and thoughts
on optimizing this scenario:

*Overview of the Problem*

The purpose of simplify_function() in PostgreSQL’s optimizer is to replace
function calls with constant values when possible, thus simplifying
expressions. However, as demonstrated in your results, this simplification
appears to have an adverse effect on the performance of a specific
workload. Disabling simplify_function() in this case leads to a notable
reduction in both the estimated query cost and the execution time.

*Performance Gains Observed*

• *Estimated Cost Reduction*: 39.69%

• *Execution Time Reduction*: 32.54%

*Key Insights from the Execution Plans*

1. *Hash Joins and Parallel Execution*: In both execution plans, PostgreSQL
is utilizing hash joins and parallel execution to distribute the workload
across multiple workers. The gather merge phase in both plans confirms that
multiple parallel workers are involved.

2. *Sorting and Finalization*: Both query plans include a final group
aggregation step followed by sorting on the computed revenue
(sum(l_extendedprice
* (1 - l_discount))). The main difference observed is in how the simplified
or unsimplified revenue calculation impacts the query cost.

3. *Disk I/O Reduction*: The plan without simplify_function() seems to
leverage more efficient disk usage, especially with external merge sorts.
The reduction in disk usage indicates that PostgreSQL can better leverage
in-memory operations in the absence of function simplification.

*Why Might simplify_function() be Causing Problems?*

The exact issue seems to lie in how simplify_function() transforms
expressions. There could be multiple factors at play, including:

1. *Expression Over-Simplification*: It’s possible that simplify_function()
results in a less optimal execution plan due to over-simplification, which
reduces PostgreSQL’s ability to leverage index scans, joins, or parallelism
effectively.

2. *Changes in Cost Estimation*: When the expression is simplified, it may
lead to changes in cost estimation, causing the planner to choose a
different join order or sorting strategy that is less optimal.

3. *Overhead of Repeated Computation*: The simplification might be causing
PostgreSQL to repeatedly compute certain expressions that could have been
computed once within a more complex expression.

*Potential Strategies to Address This Problem*

To address this problem, we have a few avenues to explore within the
PostgreSQL planner and optimizer:

1. *Investigate and Optimize Simplify Function Logic*:

• Understand the specific cases where simplify_function() introduces
overhead or leads to suboptimal execution plans. This could involve
reviewing whether the simplification is inadvertently forcing PostgreSQL to
choose less efficient join strategies, index scans, or aggregations.

• Analyze the functions targeted by simplify_function() and consider how
their simplification impacts the plan selection logic.

2. *Fine-Tuning Cost Estimation*:

• PostgreSQL’s cost estimation formulas could be missing certain cost
elements introduced by the simplified expressions. Reviewing and refining
the cost model for simplified expressions could lead to more accurate plan
choices.

3. *Expression Memoization or Pre-Aggregation*:

• One approach could be to examine whether simplifying repeated expressions
could lead to recomputation inefficiencies. Instead of simplifying certain
functions, PostgreSQL could cache or pre-aggregate expressions where
appropriate, avoiding unnecessary recomputation.

*Recommendations to Move Forward*

Since the results you presented indicate a substantial improvement when
bypassing the simplify_function(), it suggests that there’s room for
optimization within this specific area of the PostgreSQL optimizer.

1. *Perform More Tests with Varying Workloads*: Test whether disabling the
simplify_function() yields consistent benefits across different workloads
or if it’s specific to this TPC-H Query 10 pattern.

2. *Community Discussion*: The results suggest that this may warrant a
discussion with the PostgreSQL development community, especially if it’s
not an isolated case. Optimizations in core PostgreSQL logic should go
through community review and testing.

3. *Patch Submission*: If you have identified specific changes to the
simplify_function() that improve efficiency, prepare a patch for submission
to the PostgreSQL community, ensuring that the patch is thoroughly reviewed
and benchmarked.

*Conclusion*

The primary takeaway is that simplify_function() seems to be triggering
less efficient query plans in this case, likely due to over-simplification
of expressions. The solution should involve a detailed examination of the
expression transformations and their interaction with PostgreSQL’s planner.
This could lead to refining either the expression simplification logic or
the cost estimation for such transformed expressions.

If this behavior is generalizable beyond just TPC-H Query 10, it could be a
valuable enhancement to PostgreSQL’s optimizer. Otherwise, a targeted fix
for specific scenarios might be sufficient.

On Fri, Oct 25, 2024 at 1:14 AM Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu> wrote:

> Dear PostgreSQL Community,
>
> For the query 10 in TPC-H benchmark:
> select
> c_custkey,
> c_name,
> sum(l_extendedprice * (1 - l_discount)) as revenue,
> c_acctbal,
> n_name,
> c_address,
> c_phone,
> c_comment
> from
> CUSTOMER,
> ORDERS,
> LINEITEM,
> NATION
> where
> c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and o_orderdate >= date '1993-08-01'
> and o_orderdate < date '1993-08-01' + interval '3' month
> and l_returnflag = 'R'
> and c_nationkey = n_nationkey
> group by
> c_custkey,
> c_name,
> c_acctbal,
> c_phone,
> n_name,
> c_address,
> c_comment
> order by
> revenue desc
> limit
> 20;
>
>
> Its query plan is:
>
>
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=200479.71..*200479.76 *rows=20 width=205) (actual
> time=506.558..510.594 rows=20 loops=1)
> -> Sort (cost=200479.71..200622.37 rows=57064 width=205) (actual
> time=506.557..510.591 rows=20 loops=1)
> Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric -
> lineitem.l_discount)))) DESC
> Sort Method: top-N heapsort Memory: 34kB
> -> Finalize GroupAggregate (cost=191629.63..198961.25
> rows=57064 width=205) (actual time=441.132..501.986 rows=37925 loops=1)
> Group Key: customer.c_custkey, nation.n_name
> -> Gather Merge (cost=191629.63..197772.41 rows=47554
> width=205) (actual time=441.124..474.623 rows=37925 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Partial GroupAggregate
> (cost=190629.61..191283.48 rows=23777 width=205) (actual
> time=437.497..464.923 rows=12642 loops=3)
> Group Key: customer.c_custkey, nation.n_name
> -> Sort (cost=190629.61..190689.05 rows=23777
> width=185) (actual time=437.485..441.339 rows=38183 loops=3)
> Sort Key: customer.c_custkey,
> nation.n_name
> Sort Method: external merge Disk: 7184kB
> Worker 0: Sort Method: external merge
> Disk: 7448kB
> Worker 1: Sort Method: external merge
> Disk: 7264kB
> -> Hash Join (cost=181606.66..186706.85
> rows=23777 width=185) (actual time=385.555..418.269 rows=38183 loops=3)
> Hash Cond: (customer.c_nationkey =
> nation.n_nationkey)
> -> Parallel Hash Join
> (cost=181605.09..186632.29 rows=23777 width=160) (actual
> time=385.484..411.936 rows=38183 loops=3)
> Hash Cond:
> (customer.c_custkey = orders.o_custkey)
> -> Parallel Seq Scan on
> customer (cost=0.00..4225.00 rows=62500 width=148) (actual
> time=0.028..9.805 rows=50000 loops=3)
> -> Parallel Hash
> (cost=181307.88..181307.88 rows=23777 width=16) (actual
> time=385.060..385.063 rows=38183 loops=3)
> Buckets: 131072
> (originally 65536) Batches: 1 (originally 1) Memory Usage: 7648kB
> -> Parallel Hash Join
> (cost=35809.22..181307.88 rows=23777 width=16) (actual
> time=69.608..371.381 rows=38183 loops=3)
> Hash Cond:
> (lineitem.l_orderkey = orders.o_orderkey)
> -> Parallel Seq
> Scan on lineitem (cost=0.00..143863.66 rows=622855 width=16) (actual
> time=0.024..255.818 rows=492957 loops=3)
> Filter:
> (l_returnflag = 'R'::bpchar)
> Rows
> Removed by Filter: 1507448
> -> Parallel Hash
> (cost=35511.00..35511.00 rows=23858 width=8) (actual time=68.857..68.858
> rows=19046 loops=3)
> Buckets:
> 65536 Batches: 1 Memory Usage: 2816kB
> ->
> Parallel Seq Scan on orders (cost=0.00..35511.00 rows=23858 width=8)
> (actual time=0.033..62.907 rows=19046 loops=3)
>
> Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate <
> '1993-11-01 00:00:00'::timestamp without time zone))
> Rows
> Removed by Filter: 480954
> -> Hash (cost=1.25..1.25 rows=25
> width=33) (actual time=0.037..0.037 rows=25 loops=3)
> Buckets: 1024 Batches: 1
> Memory Usage: 10kB
> -> Seq Scan on nation
> (cost=0.00..1.25 rows=25 width=33) (actual time=0.020..0.024 rows=25
> loops=3)
> Planning Time: 2.295 ms
> Execution Time: *512.015 *ms
> (38 rows)
>
>
> While if I apply this patch to disable the simplify_function():
>
> diff --git a/src/backend/optimizer/util/clauses.c
> b/src/backend/optimizer/util/clauses.c
> index b4e085e9d4..155bbd9fbc 100644
> --- a/src/backend/optimizer/util/clauses.c
> +++ b/src/backend/optimizer/util/clauses.c
> @@ -2636,15 +2636,6 @@ eval_const_expressions_mutator(Node *node,
> * Code for op/func reduction is pretty
> bulky, so split it out
> * as a separate function.
> */
> - simple = simplify_function(expr->opfuncid,
> -
> expr->opresulttype, -1,
> -
> expr->opcollid,
> -
> expr->inputcollid,
> -
> &args,
> -
> false,
> -
> true,
> -
> true,
> -
> context);
> if (simple) /* successfully
> simplified it */
> return (Node *) simple;
>
>
> Then we can get a more efficient query plan:
>
>
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=120917.71..*120917.76 *rows=20 width=202) (actual
> time=336.255..344.250 rows=20 loops=1)
> -> Sort (cost=120917.71..120936.18 rows=7387 width=202) (actual
> time=336.254..344.248 rows=20 loops=1)
> Sort Key: (sum((lineitem.l_extendedprice * ((1)::numeric -
> lineitem.l_discount)))) DESC
> Sort Method: top-N heapsort Memory: 34kB
> -> Finalize GroupAggregate (cost=119764.35..120721.15 rows=7387
> width=202) (actual time=271.425..335.755 rows=37925 loops=1)
> Group Key: customer.c_custkey, nation.n_name
> -> Gather Merge (cost=119764.35..120567.25 rows=6156
> width=202) (actual time=271.420..309.049 rows=37925 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Partial GroupAggregate
> (cost=118764.33..118856.67 rows=3078 width=202) (actual
> time=267.897..296.298 rows=12642 loops=3)
> Group Key: customer.c_custkey, nation.n_name
> -> Sort (cost=118764.33..118772.02 rows=3078
> width=182) (actual time=267.881..271.626 rows=38183 loops=3)
> Sort Key: customer.c_custkey,
> nation.n_name
> Sort Method: external merge Disk: 7248kB
> Worker 0: Sort Method: external merge
> Disk: 7328kB
> Worker 1: Sort Method: external merge
> Disk: 7328kB
> -> Hash Join (cost=113641.60..118585.99
> rows=3078 width=182) (actual time=222.247..249.682 rows=38183 loops=3)
> Hash Cond: (customer.c_nationkey =
> nation.n_nationkey)
> -> Parallel Hash Join
> (cost=113640.04..118574.98 rows=3078 width=160) (actual
> time=222.183..243.657 rows=38183 loops=3)
> Hash Cond:
> (customer.c_custkey = orders.o_custkey)
> -> Parallel Seq Scan on
> customer (cost=0.00..4219.00 rows=62500 width=148) (actual
> time=0.029..5.646 rows=50000 loops=3)
> -> Parallel Hash
> (cost=113601.56..113601.56 rows=3078 width=16) (actual
> time=221.947..221.948 rows=38183 loops=3)
> Buckets: 131072
> (originally 8192) Batches: 1 (originally 1) Memory Usage: 8064kB
> -> Nested Loop
> (cost=0.43..113601.56 rows=3078 width=16) (actual time=0.096..204.897
> rows=38183 loops=3)
> -> Parallel Seq
> Scan on orders (cost=0.00..37062.50 rows=3125 width=8) (actual
> time=0.029..63.908 rows=19046 loops=3)
> Filter:
> ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < ('1993-08-01'::date
> + '3 mons'::interval month)))
> Rows
> Removed by Filter: 480954
> -> Index Scan
> using lineitem_pkey on lineitem (cost=0.43..24.45 rows=4 width=16) (actual
> time=0.006..0.007 rows=2 loops=57138)
> Index Cond:
> (l_orderkey = orders.o_orderkey)
> Filter:
> (l_returnflag = 'R'::bpchar)
> Rows
> Removed by Filter: 2
> -> Hash (cost=1.25..1.25 rows=25
> width=30) (actual time=0.034..0.035 rows=25 loops=3)
> Buckets: 1024 Batches: 1
> Memory Usage: 10kB
> -> Seq Scan on nation
> (cost=0.00..1.25 rows=25 width=30) (actual time=0.018..0.023 rows=25
> loops=3)
> Planning Time: 1.207 ms
> Execution Time: *345.427 *ms
> (36 rows)
>
>
> The estimated cost is reduced by 39.69%, and the execution time is reduced
> by 32.54%. I measured the execution time on average of 10 executions.
> I am not proposing a fixing patch, as the patch is incorrect. Instead, I
> just want to show disabling the simplify_function() function brings
> performance benefit, and it seems unexpected. I am wondering whether we can
> optimize simplify_function() to make the performance better for this
> workload?
>
>
>
> Environment:
> I used 1 GB data of TPC-H benchmark, and my entire data folder can be
> downloaded here:
> https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=sharing
> The connection string is: postgresql://ubuntu:ubuntu(at)127(dot)0(dot)0(dot)1:5432/tpch"
>
> tpch=# select version();
> version
>
>
> --------------------------------------------------------------------------------------------------
> PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 13.2.0-23ubuntu4) 13.2.0, 64-bit
> (1 row)
>
>
> Best regards,
>
> Jinsheng Ba
>
>
> Notice: This email is generated from the account of an NUS alumnus.
> Contents, views, and opinions therein are solely those of the sender.
>

--

*Best Regards *
*Shiv Iyer *

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2024-10-24 20:49:18 Re: Unexpected Performance for the Function simplify_function
Previous Message Ba Jinsheng 2024-10-24 19:43:54 Unexpected Performance for the Function simplify_function