Re: Performance of query

From: Misa Simic <misa(dot)simic(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Cindy Makarowsky <cindymakarowsky(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance of query
Date: 2013-03-23 22:27:36
Message-ID: CAH3i69mnD9ztA5Thjo2YvneyDnKKm5KCAyA-gAGGd_X44hPZ3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Jeff,

It seems my previous mail has not showed up in the list... copied/pasted
again belloew

However, you said something important:

"The join to the "state" table is not necessary. Between the foreign key
and the primary key, you know that every state exists, and that every state
exists only once. But, that will not solve your problem, as the join to
the state table is not where the time goes."

I think it is something what planner could/should be "aware off"... and
discard the join

" Merge Join (cost=0.00..2310285.02 rows=60057056 width=3) (actual
time=38.424..41992.070 rows=60057057 loops=1)"
" Merge Cond: (state.state = busbase.state)"

this part from bellow plan would save significant time if planner didn't
decide to take this step at all ....

Kind regards,

Misa

"
Hi Cindy

TBH - I don't know...

I have added this to list so maybe someone else can help...

To recap:

from start situation (table structure and indexes are in the first mail in
this thread)

EXPLAIN ANALYZE
SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER JOIN
state USING (state)
GROUP BY busbase.state

says:
"HashAggregate (cost=7416975.58..7416976.09 rows=51 width=7) (actual
time=285339.465..285339.473 rows=51 loops=1)"
" -> Hash Join (cost=2.15..7139961.94 rows=55402728 width=7) (actual
time=0.066..269527.934 rows=60057057 loops=1)"
" Hash Cond: (busbase.state = state.state)"
" -> Seq Scan on busbase (cost=0.00..6378172.28 rows=55402728
width=7) (actual time=0.022..251029.307 rows=60057057 loops=1)"
" -> Hash (cost=1.51..1.51 rows=51 width=3) (actual
time=0.028..0.028 rows=51 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 2kB"
" -> Seq Scan on state (cost=0.00..1.51 rows=51 width=3)
(actual time=0.003..0.019 rows=51 loops=1)"
"Total runtime: 285339.516 ms"

on created composite index
CREATE INDEX comp_statidx2
ON busbase
USING btree
(state, id );

we got:

"GroupAggregate (cost=0.00..2610570.81 rows=51 width=3) (actual
time=98.923..51033.888 rows=51 loops=1)"
" -> Merge Join (cost=0.00..2310285.02 rows=60057056 width=3) (actual
time=38.424..41992.070 rows=60057057 loops=1)"
" Merge Cond: (state.state = busbase.state)"
" -> Index Only Scan using state_pkey on state (cost=0.00..13.02
rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)"
" Heap Fetches: 51"
" -> Index Only Scan using comp_statidx2 on busbase
(cost=0.00..1559558.68 rows=60057056 width=3) (actual
time=38.408..12883.575 rows=60057057 loops=1)"
" Heap Fetches: 0"
"Total runtime: 51045.648 ms"

Question is - is it possible to improve it more?
"

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Misa Simic 2013-03-24 00:12:12 PostgreSQL planner
Previous Message Jeff Janes 2013-03-23 21:53:20 Re: Performance of query