Re: improving performance of UNION and ORDER BY

From: Darren Ferguson <darren(at)crystalballinc(dot)com>
To: Chris Gamache <cgg007(at)yahoo(dot)com>
Cc: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>, pgsql-general(at)postgresql(dot)org
Subject: Re: improving performance of UNION and ORDER BY
Date: 2002-03-04 15:37:45
Message-ID: Pine.LNX.4.10.10203041034090.27113-100000@thread.crystalballinc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This is not always faster but sometimes when using Postgres i have found
it to be significantly see below marked ----->>

I have removed the join from the where clause and put it in the from
clause .

Probably wont make a difference but i have had 7 occassions when it have
been significantly faster and i mean cutting 9 seconds to 1 second etc

Hope this helps

Darren Ferguson

On Mon, 4 Mar 2002, Chris Gamache wrote:

> New Query...
>
> (select
> a.username as "User",
> a.trans_date as "Date",
> tl.longtype as "Type",
> a.trans_data as "Query Data",
> a.trans_charge as "Charged",
> a.user_reference_id as "Reference ID"
> from a_trans_log a
--->> join addtypelong tl on a.trans_type = t1.shorttype
> where (
> (a.trans_date >= '12/31/01'::TIMESTAMP)
> ) order by 4 desc, 2 limit 20)
> union
> (select
> b.username as "User",
> b.trans_date as "Date",
> tl.longtype as "Type",
> b.trans_data as "Query Data",
> b.trans_charge as "Charged",
> b.user_reference_id as "Reference ID"
> from b_trans_log b
--->> join addtypelong tl on b.trans_type = t1.shorttype
> where (
> (b.trans_date >= '12/31/01'::TIMESTAMP)
> ) order by 4 desc, 2 limit 20)
> order by 4 desc, 2 limit 20;
>
> Limit (cost=12674.47..12674.47 rows=4 width=84)
> -> Sort (cost=12674.47..12674.47 rows=4 width=84)
> -> Unique (cost=12673.83..12674.43 rows=4 width=84)
> -> Sort (cost=12673.83..12673.83 rows=40 width=84)
> -> Append (cost=12055.58..12672.77 rows=40 width=84)
> -> Subquery Scan *SELECT* 1
> (cost=12055.58..12055.58 rows=20 width=84)
> -> Limit (cost=12055.58..12055.58 rows=20
> width=84)
> -> Sort (cost=12055.58..12055.58
> rows=23724 width=84)
> -> Hash Join (cost=1.20..9674.89
> rows=23724 width=84)
> -> Seq Scan on a_trans_log a
> (cost=0.00..8695.30 rows=24455 width=60)
> -> Hash (cost=1.16..1.16
> rows=16 width=24)
> -> Seq Scan on
> addtypelong tl (cost=0.00..1.16 rows=16 width=24)
> -> Subquery Scan *SELECT* 2 (cost=617.19..617.19
> rows=20 width=84)
> -> Limit (cost=617.19..617.19 rows=20
> width=84)
> -> Sort (cost=617.19..617.19 rows=2462
> width=84)
> -> Hash Join (cost=1.20..478.50
> rows=2462 width=84)
> -> Seq Scan on b_trans_log b
> (cost=0.00..378.61 rows=2462 width=60)
> -> Hash (cost=1.16..1.16
> rows=16 width=24)
> -> Seq Scan on
> addtypelong tl (cost=0.00..1.16 rows=16 width=24)
>
> That cut the query down to 6 seconds. About 5 seconds longer than I would like
> it. I'll take any performance increase, tho. I tried SET enable_seqscan=off;
> And it takes 8 seconds. Strange indeed.
>
> The reason I don't put the two tables together is that the tables don't have
> the exact same structure... I suppose I could merge the two, but it would be
> counterintuitive to logically apply some fields in table a to data collected
> for table b. Plus, I do so many more operations on the single tables than the
> joined tables that it wouldn't make good use of programming time to rewrite all
> the single table ops.
>
> I know... excuses, excuses! Thanks for the input. Any more ideas?
>
> CG
>
> --- Jean-Luc Lachance <jllachan(at)nsd(dot)ca> wrote:
> > Chris,
> >
> > I believe you can cut down the processing time by including the order by
> > and limit in each select:
> >
> > (SELECT ... ORDER BY 2 DESC, 4 LIMIT 20)
> > union
> > (SELECT ... ORDER BY 2 DESC, 4 LIMIT 20)
> > ORDER BY 2 DESC, 4 LIMIT 20
> >
> > Give it a try and let me know
> >
> >
> > JLL
> >
> > P.S.
> >
> > Question: Why not have a single table with a field log_type in ( 'A',
> > 'B') ???
> >
> >
> >
> > Chris Gamache wrote:
> > >
> > > Three tables...
> > > [...]
> > >
> > > I imagine the combination of UNION and ORDER BY causes the problem, since
> > > Postgres has to locate all the rows that match the search criteria, merge
> > them,
> > > order them, then return the top 20...
> > >
> > > Any suggestions? Did I forget to provide any data that would make things
> > > clearer?
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Sports - sign up for Fantasy Baseball
> http://sports.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Troy.Campano 2002-03-04 15:41:06 Allow all users with password
Previous Message Doug McNaught 2002-03-04 15:37:24 Re: backend closed