Re: Slow HashAggregate/cache access

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: Alexandre de Arruda Paes <adaldeia(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow HashAggregate/cache access
Date: 2015-08-05 22:07:27
Message-ID: CAKJS1f8HsLSLKvW6YhCMCTGbAmBAQFBWiyxyv5c6WNo0XB=DAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 6 August 2015 at 06:25, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

>
>
> On Wed, Aug 5, 2015 at 11:41 AM, Alexandre de Arruda Paes <
> adaldeia(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> First, sorry to compare Post with other database system, but I know
>> nothing about Oracle...
>>
>> This customer have an application made with a framework thats generates
>> the SQL statements (so, We can't make any query optimizations) .
>>
>> We did the following tests:
>>
>> 1) Postgresql 9.3 and Oracle 10 in a desktop machine(8 GB RAM, 1 SATA
>> disk,Core i5)
>> 2) Postgresql 9.3 in a server + FC storage (128 GB RAM, Xeon 32 cores,
>> SAS disks)
>>
>>
>> ​I think I know where issue is.
> The PostgreSQL planner unable pass join conditions into subquery with
> aggregate functions (it's well known limitation).
>
>
I think this statement is quite misleading. Let's look at an example:

create table t1 (a int not null, v int not null);
create table t2 (a int not null);
insert into t1 select s.i,10 from generate_series(1,1000)
s(i),generate_series(1,1000);
insert into t2 select generate_series(1,1000);
create index on t1 (a);

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s
inner join t2 on t2.a = s.a where t2.a = 1;
QUERY PLAN
----------------------------------------------------------------------------------
Nested Loop (cost=0.42..59.76 rows=1 width=12)
-> GroupAggregate (cost=0.42..42.24 rows=1 width=8)
Group Key: t1.a
-> Index Scan using t1_a_idx on t1 (cost=0.42..37.38 rows=969
width=8)
Index Cond: (a = 1)
-> Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4)
Filter: (a = 1)
(7 rows)

As you can see, the predicate is pushes down just fine into a subquery with
aggregates.

The likely reason that PostgreSQL Is not behaving the same as SQL Server
and Oracle is because the predicate pushdowns are limited to equality
operators only as internally these are all represented by a series of
"equivalence classes" which in this case say that 1 = t2.a = t1.a,
therefore it's possible to apply t1.a = 1 at the lowest level.

These equivalence classes don't currently handle non-equality operators.
Here's an example:

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s
inner join t2 on t2.a = s.a where t2.a <= 1;
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=19442.51..19466.27 rows=1 width=12)
Hash Cond: (t1.a = t2.a)
-> HashAggregate (cost=19425.00..19435.00 rows=1000 width=8)
Group Key: t1.a
-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8)
-> Hash (cost=17.50..17.50 rows=1 width=4)
-> Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4)
Filter: (a <= 1)
(8 rows)

Notice the seq scan on t1 instead of the index scan on t1_a_idx.

A way around this is to manually push the predicate down into the subquery:

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 where t1.a <=
1 group by a) s inner join t2 on t2.a = s.a where t2.a <= 1;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (cost=0.42..21.98 rows=1 width=12)
Join Filter: (t1.a = t2.a)
-> GroupAggregate (cost=0.42..4.46 rows=1 width=8)
Group Key: t1.a
-> Index Scan using t1_a_idx on t1 (cost=0.42..4.44 rows=1
width=8)
Index Cond: (a <= 1)
-> Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4)
Filter: (a <= 1)
(8 rows)

The query in question is likely performing badly because of this:

-> Seq Scan on fr13t1 (cost=0.00..25072.50
rows=311800 width=21) (actual time=0.007..115.766 rows=311800 loops=1)
Filter: (fr01codemp = '1'::smallint)
Buffers: shared hit=21175

Just how selective is fr01codemp = '1'::smallint ? Is there an index on
that column ?

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alexandre de Arruda Paes 2015-08-05 23:58:56 Re: Slow HashAggregate/cache access
Previous Message David Rowley 2015-08-05 21:06:31 Re: Slow HashAggregate/cache access