Re: BUG #15001: planner cann't distinguish composite index?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: 德哥 <digoal(at)126(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15001: planner cann't distinguish composite index?
Date: 2018-01-20 06:15:27
Message-ID: CAA4eK1Kg1ffscayD+dENWEzajW5fFBX=YhkD4KvLOjsHVKGweA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jan 18, 2018 at 3:14 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>
> On 01/10/2018 04:35 AM, Amit Kapila wrote:
>> On Tue, Jan 9, 2018 at 4:55 PM, PG Bug reporting form
>> <noreply(at)postgresql(dot)org> wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 15001
>>> Logged by: Zhou Digoal
>>> Email address: digoal(at)126(dot)com
>>> PostgreSQL version: 10.1
>>> Operating system: centos 7.x x64
>>> Description:
>>>
>>> ```
>>> postgres=# create table tbl(c1 int, c2 int, c3 int);
>>> CREATE TABLE
>>> postgres=# create index idx_tbl on tbl (c1,c2);
>>> CREATE INDEX
>>> postgres=# insert into tbl select random()*100, random()*10 from
>>> generate_series(1,10000000);
>>> INSERT 0 10000000
>>>
>>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>>> QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------------------------------------
>>> Finalize GroupAggregate (cost=176259.41..176337.18 rows=1111 width=16)
>>> Group Key: c2, c1
>>> -> Sort (cost=176259.41..176276.08 rows=6666 width=16)
>>> Sort Key: c2, c1
>>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16)
>>> Workers Planned: 6
>>> -> Partial HashAggregate (cost=174158.32..174169.43
>>> rows=1111 width=16)
>>> Group Key: c2, c1
>>> -> Parallel Index Only Scan using idx_tbl on tbl
>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>> (9 rows)
>>>
>>> postgres=# explain select c1,c2 , count(*) from tbl group by c1,c2;
>>> QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------------------------------
>>> Finalize GroupAggregate (cost=1000.53..176040.80 rows=1111 width=16)
>>> Group Key: c1, c2
>>> -> Gather Merge (cost=1000.53..175979.69 rows=6666 width=16)
>>> Workers Planned: 6
>>> -> Partial GroupAggregate (cost=0.43..174169.43 rows=1111
>>> width=16)
>>> Group Key: c1, c2
>>> -> Parallel Index Only Scan using idx_tbl on tbl
>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>> (7 rows)
>>> ```
>>>
>>> i need to set enable_sort=off, so planner can choose the same planner with
>>> c1,c2 and c2,c1 group by.
>>>
>>> ```
>>> postgres=# set enable_sort=off;
>>> SET
>>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>>> QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------------------------------
>>> Finalize HashAggregate (cost=175886.03..175897.14 rows=1111 width=16)
>>> Group Key: c2, c1
>>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16)
>>> Workers Planned: 6
>>> -> Partial HashAggregate (cost=174158.32..174169.43 rows=1111
>>> width=16)
>>> Group Key: c2, c1
>>> -> Parallel Index Only Scan using idx_tbl on tbl
>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>> (7 rows)
>>> ```
>>>
>>
>> I don't think the plan chosen is same. If you notice, for c1,c2 the
>> plan chosen is Finalize GroupAggregate -> Gather Merge whereas for
>> c2,c1 (with sort off) it is Finalize HashAggregate -> Gather. There is
>> a lot of difference between both the plans. The output by Gather
>> Merge is sorted, so you can directly use GroupAggregate whereas the
>> output of Gather is unsorted, so the final plan is HashAggregate.
>>
>> Now, here one can wonder why the planner hasn't chosen the path
>> without Sort for c2,c1 even when enable_sort=on as the cost of that
>> plan is less. If you see the cost difference of plans with sort
>> (total_cost - 176337.18) and without sort (total_cost - 175897.14), it
>> is marginal and planner thinks that they are fuzzily same. The same
>> is true for startup costs as well. Now, if both have same costs
>> (fuzzily), it gives preference to the sorted path.
>>
>
> The original report is quite unclear - it only shows some plans but does
> not explain why it's a bug or what's the expected behavior.
>
> The fuzzy comparison of costs certainly explains at least some of it,
> but my feeling is it's likely related to the fact that group by has to
> match a composite index including the order of columns.
>
> That is, an index defined on (c1,c2) is unusable for (GROUP BY c2,c1).
> Which is why GROUP BY c1,c2 uses GroupAggregate, while GROUP BY c2,c1
> uses HashAggregate.
>

Sure, but it can use sort node for c2,c1 to make use of GroupAggregate
if such a plan is cheap and that is what has happened in the first
case of this report. I have tried to explain the reason why planner
has chosen the specific plan in each of the three cases. I guess the
OP might have confused between second and third plan which appears to
be somewhat similar but are actually quite different.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Francisco Olarte 2018-01-20 09:51:10 Re: BUG #15021: Postgres crashes unexpectedly
Previous Message Wallace Baggaley 2018-01-20 00:32:06 Re: BUG #15021: Postgres crashes unexpectedly