Quick Links

Re: Can this query go faster???

From:	Ron <rjpeace(at)earthlink(dot)net>
To:	Joost Kraaijeveld <J(dot)Kraaijeveld(at)Askesis(dot)nl>, Pgsql-Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Can this query go faster???
Date:	2005-12-06 13:41:11
Message-ID:	6.2.5.6.0.20051206080854.01de6ba0@earthlink.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

At 04:43 AM 12/6/2005, Joost Kraaijeveld wrote:
>Hi,
>
>Is it possible to get this query run faster than it does now, by adding
>indexes, changing the query?
>
>SELECT customers.objectid FROM prototype.customers, prototype.addresses
>WHERE
>customers.contactaddress = addresses.objectid
>ORDER BY zipCode asc, housenumber asc
>LIMIT 1 OFFSET 283745
>
>Explain:
>
>Limit (cost=90956.71..90956.71 rows=1 width=55)
> -> Sort (cost=90247.34..91169.63 rows=368915 width=55)
> Sort Key: addresses.zipcode, addresses.housenumber
> -> Hash Join (cost=14598.44..56135.75 rows=368915 width=55)
> Hash Cond: ("outer".contactaddress = "inner".objectid)
> -> Seq Scan on customers (cost=0.00..31392.15
>rows=368915 width=80)
> -> Hash (cost=13675.15..13675.15 rows=369315 width=55)
> -> Seq Scan on addresses (cost=0.00..13675.15
>rows=369315 width=55)
>
>The customers table has an index on contactaddress and objectid.
>The addresses table has an index on zipcode+housenumber and objectid.
>
>TIA
customer names, customers.objectid, addresses, and addresses.objectid
should all be static (addresses do not change, just the customers
associated with them; and once a customer has been assigned an id
that better never change...).

To me, this sounds like the addresses and customers tables should be
duplicated and then physically laid out in sorted order by
<tablename>.objectid in one set and by the "human friendly"
associated string in the other set.
Then a finding a specific <tablename>.objectid or it's associated
string can be done in at worse O(lgn) time assuming binary search
instead of O(n) time for a sequential scan. If pg is clever enough,
it might be able to do better than that.

IOW, I'd try duplicating the addresses and customers tables and using
the appropriate CLUSTERed Index on each.

I know this breaks Normal Form. OTOH, this kind of thing is common
practice for data mining problems on static or almost static data.

Hope this is helpful,
Ron

In response to

Can this query go faster??? at 2005-12-06 09:43:49 from Joost Kraaijeveld

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Merlin Moncure	2005-12-06 13:58:31	Re: Can this query go faster???
Previous Message	Jan Wieck	2005-12-06 12:34:39	Re: Performance degradation after successive UPDATE's