Re: Query that took a lot of time in Postgresql when not using trim in order by

From: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query that took a lot of time in Postgresql when not using trim in order by
Date: 2015-11-29 13:23:23
Message-ID: 20151129132323.GA29654@hjp.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2015-11-25 19:35:15 +0300, Evgeniy Shishkin wrote:
> Fast:
>
> Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
> Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))
> Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
> Sort Method: quicksort Memory: 13121kB
> -> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)
> Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)
> -> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)
>
>
> Slow:
>
> Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
> Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
> -> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
> Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
> Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
> Sort Method: quicksort Memory: 265592kB
> -> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)
>
>
> The difference is in the top of plans.
> As we see, hashjoin time is practically the same.
> But fast plan uses hashagg first and only 43k rows require sorting.
> Slow plan dominated by sorting 900k rows.
>
> I wonder if increasing cpu_tuple_cost will help.
> As cost difference between two plans is negligible now.

Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an
in-memory sort of 900k rows seems slow to me. I tested this on my PC at
home (1.8 GHz Core2 Dual, so a rather old and slow box) and I could sort
1E6 rows of 128 random bytes in 5.6 seconds. Even if I kept the first 96
bytes constant (so only the last 32 were random), it took only 21
seconds. Either this CPU is really slow or the data is heavily skewed -
is it possible that all dimensions except dim_vendedor.a3_nome have only
one or very few values? In that case changing the sort order might help.

hp

--
_ | Peter J. Holzer | I want to forget all about both belts and
|_|_) | | suspenders; instead, I want to buy pants
| | | hjp(at)hjp(dot)at | that actually fit.
__/ | http://www.hjp.at/ | -- http://noncombatant.org/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2015-11-29 17:58:36 Re: Query that took a lot of time in Postgresql when not using trim in order by
Previous Message Jeff Janes 2015-11-28 19:25:47 Re: Index scan cost calculation