Quick Links

Re: Print correct startup cost for the group aggregate.

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Print correct startup cost for the group aggregate.
Date:	2017-03-06 04:32:17
Message-ID:	CAFjFpRea7Gsx3wWg7B1ZV5ZNgfFZWNXR2po1X545Ly+rcmeicA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Mar 4, 2017 at 2:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 2, 2017 at 6:48 PM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> On Thu, Mar 2, 2017 at 6:06 PM, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> wrote:
>>> While reading through the cost_agg() I found that startup cost for the
>>> group aggregate is not correctly assigned. Due to this explain plan is
>>> not printing the correct startup cost.
>>>
>>> Without patch:
>>>
>>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>>> filler like '%foo%' group by aid;
>>> QUERY PLAN
>>> -------------------------------------------------------------------------------------
>>> GroupAggregate (cost=81634.33..85102.04 rows=198155 width=12)
>>> Group Key: aid
>>> -> Sort (cost=81634.33..82129.72 rows=198155 width=8)
>>> Sort Key: aid
>>> -> Seq Scan on pgbench_accounts (cost=0.00..61487.89 rows=198155
>>> width=8)
>>> Filter: (filler ~~ '%foo%'::text)
>>> (6 rows)
>>>
>>> With patch:
>>>
>>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>>> filler like '%foo%' group by aid;
>>> QUERY PLAN
>>> -------------------------------------------------------------------------------------
>>> GroupAggregate (cost=82129.72..85102.04 rows=198155 width=12)
>>> Group Key: aid
>>> -> Sort (cost=81634.33..82129.72 rows=198155 width=8)
>>> Sort Key: aid
>>> -> Seq Scan on pgbench_accounts (cost=0.00..61487.89 rows=198155
>>> width=8)
>>> Filter: (filler ~~ '%foo%'::text)
>>> (6 rows)
>>>
>>
>> The reason the reason why startup_cost = input_startup_cost and not
>> input_total_cost for aggregation by sorting is we don't need the whole
>> input before the Group/Agg plan can produce the first row. But I think
>> setting startup_cost = input_startup_cost is also not exactly correct.
>> Before the plan can produce one row, it has to transit through all the
>> rows belonging to the group to which the first row belongs. On an
>> average it has to scan (total number of rows)/(number of groups)
>> before producing the first aggregated row. startup_cost will be
>> input_startup_cost + cost to scan (total number of rows)/(number of
>> groups) rows + cost of transiting over those many rows. Total cost =
>> startup_cost + cost of scanning and transiting through the remaining
>> number of input rows.
>
> While that idea has some merit, I think it's inconsistent with current
> practice. cost_seqscan(), for example, doesn't include the cost of
> reading the first page in the startup cost, even though that certainly
> must be done before returning the first row.

OTOH, while costing for merge join, initial_cost_mergejoin() seems to
consider the cost of scanning outer and inner relations upto the first
matching tuple on both sides in the startup_cost. Different policies
at different places.

> I think there have been
> previous discussions of switching over to the practice for which you
> are advocating here, but my impression (without researching) is that
> the current practice is more like what Rushabh did.
>

I am not sure Rushabh's approach is correct. Here's the excerpt from my mail.

>> The reason the reason why startup_cost = input_startup_cost and not
>> input_total_cost for aggregation by sorting is we don't need the whole
>> input before the Group/Agg plan can produce the first row.

With Rushabh's approach the startup cost of aggregation by sorting
would be way higher that it's right now. Secondly, it would match that
of hash aggregation and thus forcing hash aggregation to be chosen in
almost all the cases.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Re: Print correct startup cost for the group aggregate. at 2017-03-04 09:20:10 from Robert Haas

Responses

Re: Print correct startup cost for the group aggregate. at 2017-03-06 09:16:59 from Rushabh Lathia
Re: Print correct startup cost for the group aggregate. at 2017-03-09 01:30:18 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Sharma	2017-03-06 05:09:18	Re: Parallel seq. plan is not coming against inheritance or partition table
Previous Message	Ashutosh Bapat	2017-03-06 04:00:47	Re: dropping partitioned tables without CASCADE