Re: Speed up the query

From: Hengky Lie <hengkyliwandouw(at)gmail(dot)com>
To: Torsten Förtsch <torsten(dot)foertsch(at)gmx(dot)net>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Speed up the query
Date: 2013-12-01 17:33:03
Message-ID: CADvRrxN6-=tqEiTvxn4wYcEvsLvQioLuU94BhGNAOOfYGEeEnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear Torsten and friends,

This is another good case to analyse why the query performance is not the
same :

There are 2 query :
(1)

with qry1 as (
select subkategori, kodebarang as produkid, namabarang, keluar,
tbltransaksi.modal*keluar as ttlmodal,
case
when tbltransaksi.discount<=100 then
keluar*(harga - (discount/100*harga))
when tbltransaksi.discount>100
then keluar*(harga-discount)
end as jumlah
from tblpenjualan
join tbltransaksi on tblpenjualan.id=tbltransaksi.jualid
join tblproduk on tbltransaksi.kodebarang=tblproduk.produkid
join tblsubkategori on
tblproduk.subkategoriid=tblsubkategori.tblsubkategoriid
join tblkategori on tblkategori.kategoriid=tblsubkategori.kategoriid
where tblpenjualan.tanggal between '01/01/13' and '31/10/13')

select subkategori,produkid, namabarang , sum(keluar) as ttlkeluar,
sum(jumlah) as jumlah, sum(ttlmodal) as ttlmodal
from qry1
group by subkategori, produkid, namabarang

"QUERY PLAN"
"HashAggregate (cost=99124.61..99780.94 rows=65633 width=334) (actual
time=3422.786..3434.511 rows=24198 loops=1)"
" Buffers: shared hit=14543"
" CTE qry1"
" -> Hash Join (cost=11676.07..76153.06 rows=656330 width=73)
(actual time=181.683..2028.046 rows=657785 loops=1)"
" Hash Cond: ((tbltransaksi.kodebarang)::text =
(tblproduk.produkid)::text)"
" Buffers: shared hit=14543"
" -> Hash Join (cost=7247.75..44651.13 rows=656330
width=31) (actual time=84.885..787.029 rows=658438 loops=1)"
" Hash Cond: (tbltransaksi.jualid = tblpenjualan.id)"
" Buffers: shared hit=13204"
" -> Seq Scan on tbltransaksi (cost=0.00..18730.83
rows=807283 width=35) (actual time=0.005..157.004 rows=807033
loops=1)"
" Buffers: shared hit=10658"
" -> Hash (cost=5293.64..5293.64 rows=156329 width=4)
(actual time=84.842..84.842 rows=154900 loops=1)"
" Buckets: 16384 Batches: 1 Memory Usage: 3631kB"
" Buffers: shared hit=2546"
" -> Seq Scan on tblpenjualan
(cost=0.00..5293.64 rows=156329 width=4) (actual time=0.007..49.444
rows=154900 loops=1)"
" Filter: ((tanggal >= '2013-01-01'::date)
AND (tanggal <= '2013-10-31'::date))"
" Rows Removed by Filter: 27928"
" Buffers: shared hit=2546"
" -> Hash (cost=3364.19..3364.19 rows=85130 width=55)
(actual time=96.736..96.736 rows=84701 loops=1)"
" Buckets: 16384 Batches: 1 Memory Usage: 6323kB"
" Buffers: shared hit=1339"
" -> Hash Join (cost=5.35..3364.19 rows=85130
width=55) (actual time=0.241..62.038 rows=84701 loops=1)"
" Hash Cond: ((tblproduk.subkategoriid)::text =
(tblsubkategori.tblsubkategoriid)::text)"
" Buffers: shared hit=1339"
" -> Seq Scan on tblproduk (cost=0.00..2188.30
rows=85130 width=45) (actual time=0.008..17.549 rows=85035 loops=1)"
" Buffers: shared hit=1337"
" -> Hash (cost=4.23..4.23 rows=90 width=17)
(actual time=0.224..0.224 rows=90 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 4kB"
" Buffers: shared hit=2"
" -> Hash Join (cost=1.09..4.23 rows=90
width=17) (actual time=0.028..0.153 rows=90 loops=1)"
" Hash Cond:
((tblsubkategori.kategoriid)::text = (tblkategori.kategoriid)::text)"
" Buffers: shared hit=2"
" -> Seq Scan on tblsubkategori
(cost=0.00..1.90 rows=90 width=21) (actual time=0.005..0.029 rows=90
loops=1)"
" Buffers: shared hit=1"
" -> Hash (cost=1.04..1.04 rows=4
width=4) (actual time=0.011..0.011 rows=4 loops=1)"
" Buckets: 1024 Batches: 1
Memory Usage: 1kB"
" Buffers: shared hit=1"
" -> Seq Scan on tblkategori
(cost=0.00..1.04 rows=4 width=4) (actual time=0.002..0.004 rows=4
loops=1)"
" Buffers: shared hit=1"
" -> CTE Scan on qry1 (cost=0.00..13126.60 rows=656330 width=334)
(actual time=181.687..2556.526 rows=657785 loops=1)"
" Buffers: shared hit=14543"
"Total runtime: 3454.442 ms"

(2)this is exactly the same query with no.1 except it uses subquery

select subkategori,produkid, namabarang , sum(keluar) as ttlkeluar,
sum(jumlah) as jumlah, sum(ttlmodal) as ttlmodal from
( select subkategori, kodebarang as produkid, namabarang, keluar,
tbltransaksi.modal*keluar as ttlmodal,
case
when tbltransaksi.discount<=100 then
keluar*(harga - (discount/100*harga))
when tbltransaksi.discount>100
then keluar*(harga-discount)
end as jumlah
from tblpenjualan
join tbltransaksi on tblpenjualan.id=tbltransaksi.jualid
join tblproduk on tbltransaksi.kodebarang=tblproduk.produkid
join tblsubkategori on tblproduk.subkategoriid=tblsubkategori.tblsubkategoriid
join tblkategori on tblkategori.kategoriid=tblsubkategori.kategoriid
where tblpenjualan.tanggal between '01/01/13' and '31/10/13')
as dt group by subkategori, produkid, namabarang

The analyse result :

"QUERY PLAN"
"GroupAggregate (cost=124800.44..157616.94 rows=656330 width=73)
(actual time=13895.782..15236.212 rows=24198 loops=1)"
" Buffers: shared hit=14543"
" -> Sort (cost=124800.44..126441.26 rows=656330 width=73) (actual
time=13895.750..14024.911 rows=657785 loops=1)"
" Sort Key: tblsubkategori.subkategori,
tbltransaksi.kodebarang, tblproduk.namabarang"
" Sort Method: quicksort Memory: 103431kB"
" Buffers: shared hit=14543"
" -> Hash Join (cost=11676.07..61385.63 rows=656330 width=73)
(actual time=177.521..1264.431 rows=657785 loops=1)"
" Hash Cond: ((tbltransaksi.kodebarang)::text =
(tblproduk.produkid)::text)"
" Buffers: shared hit=14543"
" -> Hash Join (cost=7247.75..44651.13 rows=656330
width=31) (actual time=84.473..739.064 rows=658438 loops=1)"
" Hash Cond: (tbltransaksi.jualid = tblpenjualan.id)"
" Buffers: shared hit=13204"
" -> Seq Scan on tbltransaksi
(cost=0.00..18730.83 rows=807283 width=35) (actual time=0.005..146.601
rows=807033 loops=1)"
" Buffers: shared hit=10658"
" -> Hash (cost=5293.64..5293.64 rows=156329
width=4) (actual time=84.429..84.429 rows=154900 loops=1)"
" Buckets: 16384 Batches: 1 Memory Usage: 3631kB"
" Buffers: shared hit=2546"
" -> Seq Scan on tblpenjualan
(cost=0.00..5293.64 rows=156329 width=4) (actual time=0.008..48.968
rows=154900 loops=1)"
" Filter: ((tanggal >=
'2013-01-01'::date) AND (tanggal <= '2013-10-31'::date))"
" Rows Removed by Filter: 27928"
" Buffers: shared hit=2546"
" -> Hash (cost=3364.19..3364.19 rows=85130 width=55)
(actual time=92.998..92.998 rows=84701 loops=1)"
" Buckets: 16384 Batches: 1 Memory Usage: 6323kB"
" Buffers: shared hit=1339"
" -> Hash Join (cost=5.35..3364.19 rows=85130
width=55) (actual time=0.240..59.587 rows=84701 loops=1)"
" Hash Cond: ((tblproduk.subkategoriid)::text
= (tblsubkategori.tblsubkategoriid)::text)"
" Buffers: shared hit=1339"
" -> Seq Scan on tblproduk
(cost=0.00..2188.30 rows=85130 width=45) (actual time=0.008..16.942
rows=85035 loops=1)"
" Buffers: shared hit=1337"
" -> Hash (cost=4.23..4.23 rows=90
width=17) (actual time=0.221..0.221 rows=90 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 4kB"
" Buffers: shared hit=2"
" -> Hash Join (cost=1.09..4.23
rows=90 width=17) (actual time=0.028..0.142 rows=90 loops=1)"
" Hash Cond:
((tblsubkategori.kategoriid)::text = (tblkategori.kategoriid)::text)"
" Buffers: shared hit=2"
" -> Seq Scan on tblsubkategori
(cost=0.00..1.90 rows=90 width=21) (actual time=0.006..0.046 rows=90
loops=1)"
" Buffers: shared hit=1"
" -> Hash (cost=1.04..1.04
rows=4 width=4) (actual time=0.012..0.012 rows=4 loops=1)"
" Buckets: 1024 Batches: 1
Memory Usage: 1kB"
" Buffers: shared hit=1"
" -> Seq Scan on
tblkategori (cost=0.00..1.04 rows=4 width=4) (actual
time=0.002..0.005 rows=4 loops=1)"
" Buffers: shared hit=1"
"Total runtime: 15244.038 ms"

This is my Postgresqlconf :
max_connections=50
shared_buffers=1024MB
wall_buffers=16MB
max_prepared_transactions=0
work_mem=50MB
maintenance_work_mem=256MB

Thanks

On Sun, Dec 1, 2013 at 9:39 PM, Torsten Förtsch <torsten(dot)foertsch(at)gmx(dot)net>wrote:

> On 01/12/13 13:40, Hengky Liwandouw wrote:
> > Torsten, your 2nd option works now. I dont know maybe copy and paste
> error. I just want to report that your 2nd option with work_mem=100MB
> required the same amount of time (about 58 seconds), while my query
> required 4.9 seconds.
> >
> > What make this two query so different ?
> >
> Without the "explain (analyze,buffers) ..." it's hard to say. A CTE is
> currently a way to trick the query planner because it's planned
> separately. A subquery on the other hand is integrated in the outer
> query and planned/optimized as one thing.
>
> If your planner parameters are correctly set up, the subquery should
> almost always outrun the CTE. Often, though, not much.
>
> Now, you may ask why CTE then exist at all? There are things that cannot
> be expressed without them, in particular WITH RECURSIVE.
>
> The fact that it performs so badly as a subquery indicates that either
> your table statistics are suboptimal or more probably the planner
> parameters or work_mem.
>
> Another point I have just noticed, how does it perform if you change
>
> and extract(... from tanggal)='2013'
>
> to
>
> and '2013-01-01'::date <= tanggal
> and tanggal < '2013-01-01'::date + '1 year'::interval
>
> Also, I think it would be possible to even get rid of the subquery. At
> least you can get rid of the tanggal and jumlah output from the subquery.
>
> select s.id, s.nama, t.kodebarang, p.namabarang,
> sum(case when extract(month from t.tanggal) = 1
> then t.keluar else 0 end) as jan,
> sum(case when extract(month from t.tanggal) = 2
> then t.keluar else 0 end) as feb,
> ...,
> sum(t.keluar) as total
> from tbltransaksi t
> join tblproduk p on t.kodebarang=p.produkid
> join tblsupplier s on p.supplierid=s.id
> where (t.jualid is not null or t.returjualid is not null)
> and '2013-01-01'::date <= t.tanggal
> and t.tanggal < '2013-01-01'::date + '1 year'::interval
> group by s.id, s.nama, t.kodebarang, p.namabarang
> order by total desc
> limit 1000
>
> would be interesting to see the "explain (analyze,buffers)" output for
> the query above.
>
> Please double-check the query. I think it should do exactly the same as
> your query. But you know, shit happens.
>
> BTW, am I right in assuming that you are from Malaysia or Indonesia? I
> am trying to learn a bit of Malay. I am a complete beginner, though.
>
> Selamat berjaya (is that possible to wish you success?)
> Torsten
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hengky Lie 2013-12-01 17:41:28 Re: Speed up the query
Previous Message Hengky Lie 2013-12-01 16:14:38 Re: Speed up the query