From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | 德哥 <digoal(at)126(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15001: planner cann't distinguish composite index? |
Date: | 2018-01-21 01:08:25 |
Message-ID: | c0f75546-d028-125c-3581-9cc96b7c1c3a@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 01/20/2018 07:15 AM, Amit Kapila wrote:
> On Thu, Jan 18, 2018 at 3:14 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>>
>> On 01/10/2018 04:35 AM, Amit Kapila wrote:
>>> On Tue, Jan 9, 2018 at 4:55 PM, PG Bug reporting form
>>> <noreply(at)postgresql(dot)org> wrote:
>>>> The following bug has been logged on the website:
>>>>
>>>> Bug reference: 15001
>>>> Logged by: Zhou Digoal
>>>> Email address: digoal(at)126(dot)com
>>>> PostgreSQL version: 10.1
>>>> Operating system: centos 7.x x64
>>>> Description:
>>>>
>>>> ```
>>>> postgres=# create table tbl(c1 int, c2 int, c3 int);
>>>> CREATE TABLE
>>>> postgres=# create index idx_tbl on tbl (c1,c2);
>>>> CREATE INDEX
>>>> postgres=# insert into tbl select random()*100, random()*10 from
>>>> generate_series(1,10000000);
>>>> INSERT 0 10000000
>>>>
>>>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>>>> QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------
>>>> Finalize GroupAggregate (cost=176259.41..176337.18 rows=1111 width=16)
>>>> Group Key: c2, c1
>>>> -> Sort (cost=176259.41..176276.08 rows=6666 width=16)
>>>> Sort Key: c2, c1
>>>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16)
>>>> Workers Planned: 6
>>>> -> Partial HashAggregate (cost=174158.32..174169.43
>>>> rows=1111 width=16)
>>>> Group Key: c2, c1
>>>> -> Parallel Index Only Scan using idx_tbl on tbl
>>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>>> (9 rows)
>>>>
>>>> postgres=# explain select c1,c2 , count(*) from tbl group by c1,c2;
>>>> QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------------------------------------
>>>> Finalize GroupAggregate (cost=1000.53..176040.80 rows=1111 width=16)
>>>> Group Key: c1, c2
>>>> -> Gather Merge (cost=1000.53..175979.69 rows=6666 width=16)
>>>> Workers Planned: 6
>>>> -> Partial GroupAggregate (cost=0.43..174169.43 rows=1111
>>>> width=16)
>>>> Group Key: c1, c2
>>>> -> Parallel Index Only Scan using idx_tbl on tbl
>>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>>> (7 rows)
>>>> ```
>>>>
>>>> i need to set enable_sort=off, so planner can choose the same planner with
>>>> c1,c2 and c2,c1 group by.
>>>>
>>>> ```
>>>> postgres=# set enable_sort=off;
>>>> SET
>>>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>>>> QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------------------------------------
>>>> Finalize HashAggregate (cost=175886.03..175897.14 rows=1111 width=16)
>>>> Group Key: c2, c1
>>>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16)
>>>> Workers Planned: 6
>>>> -> Partial HashAggregate (cost=174158.32..174169.43 rows=1111
>>>> width=16)
>>>> Group Key: c2, c1
>>>> -> Parallel Index Only Scan using idx_tbl on tbl
>>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>>> (7 rows)
>>>> ```
>>>>
>>>
>>> I don't think the plan chosen is same. If you notice, for c1,c2 the
>>> plan chosen is Finalize GroupAggregate -> Gather Merge whereas for
>>> c2,c1 (with sort off) it is Finalize HashAggregate -> Gather. There is
>>> a lot of difference between both the plans. The output by Gather
>>> Merge is sorted, so you can directly use GroupAggregate whereas the
>>> output of Gather is unsorted, so the final plan is HashAggregate.
>>>
>>> Now, here one can wonder why the planner hasn't chosen the path
>>> without Sort for c2,c1 even when enable_sort=on as the cost of that
>>> plan is less. If you see the cost difference of plans with sort
>>> (total_cost - 176337.18) and without sort (total_cost - 175897.14), it
>>> is marginal and planner thinks that they are fuzzily same. The same
>>> is true for startup costs as well. Now, if both have same costs
>>> (fuzzily), it gives preference to the sorted path.
>>>
>>
>> The original report is quite unclear - it only shows some plans but does
>> not explain why it's a bug or what's the expected behavior.
>>
>> The fuzzy comparison of costs certainly explains at least some of it,
>> but my feeling is it's likely related to the fact that group by has to
>> match a composite index including the order of columns.
>>
>> That is, an index defined on (c1,c2) is unusable for (GROUP BY c2,c1).
>> Which is why GROUP BY c1,c2 uses GroupAggregate, while GROUP BY c2,c1
>> uses HashAggregate.
>>
>
> Sure, but it can use sort node for c2,c1 to make use of GroupAggregate
> if such a plan is cheap and that is what has happened in the first
> case of this report. I have tried to explain the reason why planner
> has chosen the specific plan in each of the three cases. I guess the
> OP might have confused between second and third plan which appears to
> be somewhat similar but are actually quite different.
>
Yes, I understand that, and I agree with your analysis regarding fuzzy
cost comparisons.
My point is the first query can't possibly use the same plan shown for
the second one (I mean, the one with index only scan and group aggregate
leveraging the sorted output directly) because the GROUP BY clause does
not match the index definition.
That being said, the more I look at the plans the more I'm confused
about what the OP might have been asking about :-/
regrds
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2018-01-21 02:59:24 | BUG #15023: problem with pg_statistic |
Previous Message | Tomas Vondra | 2018-01-21 00:56:30 | Re: BUG #15021: Postgres crashes unexpectedly |