Re: Performance Issue on Query 18 of TPC-H Benchmark

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance Issue on Query 18 of TPC-H Benchmark
Date: 2024-10-17 02:14:35
Message-ID: CAApHDvo7najDZTg4=KMDsBTwE0+9K8=AxAgLxQTT1_AOGkYP=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, 17 Oct 2024 at 07:26, Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu> wrote:
>
> >I believe you are not allowing the optimizer to generate a different aggregation path (Group Aggregate) because it requires a sort operation. So I think this is not correct.
>
> Yes, this is what I did. I though it is what you were asking? I have not found another way to enforce HashAggregate, so I directly modified the code. Can you eliberate why it is incorrect?

Like I mentioned earlier, GroupAggregate needs its input to be
guaranteed to be sorted by the group key. Above you've hacked the
planner to produce:

-> GroupAggregate (cost=0.00..208346.45 rows=126825 width=4) (actual
time=334.397..1549.837 rows=9 loops=3)
Group Key:
lineitem_1.l_orderkey
Filter:
(sum(lineitem_1.l_quantity) > '314'::numeric)
Rows Removed
by Filter: 1500388
-> Seq Scan
on lineitem lineitem_1 (cost=0.00..172626.23 rows=6002623 width=9)
(actual time=0.051..438.226 rows=6001215 loops=3)

This is simply not a valid plan. GroupAggreate relies on the rows
arriving in Group Key order as it checks if the current row is in the
same group as the previous row. When it's not, that group is classes
as complete and the HAVING clause can be evaluated. It's possible the
plan you've ended up when happens to produce the correct results
before the lineitem table is already in l_orderkey order. Try
updating one of the earlier rows in a way that puts the heap out of
order and you'll likely notice the "Rows Removed by Filter" change. Or
try without the HAVING clause and observe the number of groups
changing.

I strongly suggest you experiment further before proposing changes in
this area. Also, this is not a valid discussion for the pgsql-bugs
mailing list. This list is about reporting newly discovered bugs.
It's not a place to discuss proposing new ones.

David

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrei Lepikhov 2024-10-17 02:51:12 Re: Performance Issue on Query 18 of TPC-H Benchmark
Previous Message Tom Lane 2024-10-17 00:56:52 Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault