Optimize sort before groupping

From: pasman pasmański <pasman(dot)p(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Optimize sort before groupping
Date: 2012-02-16 09:31:15
Message-ID: CAOWY8=Z6ho+jLi2QgVfBDz8MjCmfrGOtJ3JLkYixK4NZT7ebkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi.

Sometimes order of rows readed from index allows to skip sort node.
But sometimes planner don't deduce it. In example below order from index
"NumerStacji_NumerKierunkowy_KodBłędu_LP"
is preserved in merge join and can be used in groupping node, but planner
don't see it.

First query and explain:

SELECT
"NKA","NTA",
count(nullif(b."Pierwszy zwrot"=b."DataPliku",false)) as "CDR",
min(b."Data") || ',' || max(b."Data") as "Biling",
b."KodBłędu",
sum((b."Pierwszy zwrot"=b."DataPliku" and not b."Darmowe")::integer)
as "Odpłatne",
max(r."LP")
FROM
"Bladpol2" b left join "Rejestr stacji do naprawy" r
on (
b."NTA" = r."Numer stacji"
and b."NKA" = r."Numer kierunkowy"
and b."KodBłędu" = r."Kod Błędu"
and replace(b."Data",':','.')::cube && r."Zakres"
)
WHERE
b."KodBłędu" similar to '74|80|81' and r."Wartość" is null
GROUP BY
b."NTA",b."NKA",b."KodBłędu",r."LP"
HAVING
not bool_and(b."Darmowe")
ORDER BY
max(b."Data") desc
LIMIT 4000;

QUERY PLAN
Limit (cost=191422.67..191432.67 rows=4000 width=42) (actual
time=57136.554..57161.084 rows=4000 loops=1)
-> Sort (cost=191422.67..192855.26 rows=573034 width=42) (actual
time=57136.546..57145.420 rows=4000 loops=1)
" Sort Key: (max((b.""Data"")::text))"
Sort Method: top-N heapsort Memory: 660kB
-> GroupAggregate (cost=122756.63..154273.50 rows=573034
width=42) (actual time=49821.500..56670.665 rows=64064 loops=1)
" Filter: (NOT bool_and(b.""Darmowe""))"
-> Sort (cost=122756.63..124189.21 rows=573034
width=42) (actual time=49821.318..51902.438 rows=865978 loops=1)
" Sort Key: b.""NTA"", b.""NKA"", b.""KodBłędu"", r.""LP"""
Sort Method: external sort Disk: 42824kB
-> Merge Left Join (cost=19.16..60017.63
rows=573034 width=42) (actual time=0.337..26655.744 rows=865978
loops=1)
" Merge Cond: (((b.""NTA"")::text =
(r.""Numer stacji"")::text) AND ((b.""NKA"")::text = (r.""Numer
kierunkowy"")::text) AND ((b.""KodBłędu"")::text = (r.""Kod
Błędu"")::text))"
" Join Filter: ((replace((b.""Data"")::text,
':'::text, '.'::text))::cube && r.""Zakres"")"
" Filter: (r.""Wartość"" IS NULL)"
" -> Index Scan using
""Bladpol2_nta_nka_kod_błędu_btree"" on ""Bladpol2"" b
(cost=0.00..46593.65 rows=1452312 width=38) (actual
time=0.152..8513.305 rows=1439383 loops=1)"
" Filter: ((""KodBłędu"")::text ~
'^(?:74|80|81)$'::text)"
" -> Index Scan using
""NumerStacji_NumerKierunkowy_KodBłędu_LP"" on ""Rejestr stacji do
naprawy"" r (cost=0.00..1405.83 rows=70476 width=47) (actual
time=0.014..3146.595 rows=1045687 loops=1)"
Total runtime: 57183.476 ms

In second query planner skip sort before groupping:

SELECT
"NKA","NTA",
count(nullif(b."Pierwszy zwrot"=b."DataPliku",false)) as "CDR",
min(b."Data") || ',' || max(b."Data") as "Biling",
b."KodBłędu",
sum((b."Pierwszy zwrot"=b."DataPliku" and not b."Darmowe")::integer)
as "Odpłatne",
max(r."LP")
FROM
"Bladpol2" b left join "Rejestr stacji do naprawy" r
on (
b."NTA" = r."Numer stacji"
and b."NKA" = r."Numer kierunkowy"
and b."KodBłędu" = r."Kod Błędu"
and replace(b."Data",':','.')::cube && r."Zakres"
)
WHERE
b."KodBłędu" similar to '74|80|81' and r."Wartość" is null
GROUP BY
b."NTA",b."NKA",b."KodBłędu"--,r."LP"
HAVING
not bool_and(b."Darmowe")
ORDER BY
max(b."Data") desc
LIMIT 4000;

QUERY PLAN
Limit (cost=91667.54..91677.54 rows=4000 width=42) (actual
time=32004.992..32029.539 rows=4000 loops=1)
-> Sort (cost=91667.54..92030.62 rows=145232 width=42) (actual
time=32004.983..32013.844 rows=4000 loops=1)
" Sort Key: (max((b.""Data"")::text))"
Sort Method: top-N heapsort Memory: 660kB
-> GroupAggregate (cost=19.16..82252.30 rows=145232
width=42) (actual time=1.954..31534.246 rows=63759 loops=1)
" Filter: (NOT bool_and(b.""Darmowe""))"
-> Merge Left Join (cost=19.16..60017.63 rows=573034
width=42) (actual time=0.339..26669.766 rows=865978 loops=1)
" Merge Cond: (((b.""NTA"")::text = (r.""Numer
stacji"")::text) AND ((b.""NKA"")::text = (r.""Numer
kierunkowy"")::text) AND ((b.""KodBłędu"")::text = (r.""Kod
Błędu"")::text))"
" Join Filter: ((replace((b.""Data"")::text,
':'::text, '.'::text))::cube && r.""Zakres"")"
" Filter: (r.""Wartość"" IS NULL)"
" -> Index Scan using
""Bladpol2_nta_nka_kod_błędu_btree"" on ""Bladpol2"" b
(cost=0.00..46593.65 rows=1452312 width=38) (actual
time=0.145..8622.003 rows=1439383 loops=1)"
" Filter: ((""KodBłędu"")::text ~
'^(?:74|80|81)$'::text)"
" -> Index Scan using
""NumerStacji_NumerKierunkowy_KodBłędu_LP"" on ""Rejestr stacji do
naprawy"" r (cost=0.00..1405.83 rows=70476 width=47) (actual
time=0.014..3113.398 rows=1045687 loops=1)"
Total runtime: 32045.317 ms

--
------------
pasman

Browse pgsql-general by date

  From Date Subject
Next Message Jan Otto 2012-02-16 10:22:39 Re: Dump functions alone
Previous Message Guillaume Lelarge 2012-02-16 09:20:06 Re: Unable to execute \copy from Client Application