From: | "Jim C(dot) Nasby" <jim(at)nasby(dot)net> |
---|---|
To: | Ben <bench(at)silentmedia(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: any hope for my big query? |
Date: | 2006-09-29 15:35:57 |
Message-ID: | 20060929153557.GD90915@nasby.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
There's no join criteria for umdb.node... is that really what you want?
On Thu, Sep 28, 2006 at 03:18:56PM -0700, Ben wrote:
> Hey guys, I've got a query that is inherently expensive, because it has to
> do some joins against some large tables. But it's currently *very*
> expensive (at least for a web app), and I've been struggling in vain all
> day to knock the cost down. Annoyingly, the least costly version I've come
> up with remains my first attempt, and is the most straight-forward:
>
> explain select
> distinct public.album.id
> from
> public.album,public.albumjoin,public.track,umdb.node
> where
> node.dir=2811
> and albumjoin.album = public.album.id
> and public.albumjoin.track = public.track.id
> and levenshtein(substring(public.track.name for 75),
> substring(node.file for 75)) <= 10
> and public.album.id in
> (select album from albumjoin group by album having count(*)
> between 15 and 25) group by public.album.id
> having count(*) >= 5;
>
>
> Unique (cost=991430.53..1013711.74 rows=425772 width=4)
> -> GroupAggregate (cost=991430.53..1012647.31 rows=425772 width=4)
> Filter: (count(*) >= 5)
> -> Sort (cost=991430.53..996373.93 rows=1977360 width=4)
> Sort Key: album.id
> -> Nested Loop (cost=513549.06..737866.68 rows=1977360
> width=4)
> Join Filter:
> (levenshtein("substring"(("inner".name)::text, 1, 75),
> "substring"("outer".file, 1, 75)) <= 10)
> -> Index Scan using node_dir on node
> (cost=0.00..3.22 rows=16 width=40)
> Index Cond: (dir = 2811)
> -> Materialize (cost=513549.06..520153.61
> rows=370755 width=25)
> -> Hash Join (cost=271464.72..510281.31
> rows=370755 width=25)
> Hash Cond: ("outer".id = "inner".track)
> -> Seq Scan on track
> (cost=0.00..127872.69 rows=5111469
> width=25)
> -> Hash (cost=268726.83..268726.83
> rows=370755 width=8)
> -> Hash Join
> (cost=150840.51..268726.83
> rows=370755 width=8)
> Hash Cond: ("outer".album =
> "inner".id)
> -> Seq Scan on albumjoin
> (cost=0.00..88642.18
> rows=5107318 width=8)
> -> Hash
> (cost=150763.24..150763.24
> rows=30908 width=8)
> -> Hash Join
> (cost=127951.57..150763.24 rows=30908 width=8)
> Hash Cond:
> ("outer".id =
> "inner".album)
> -> Seq Scan on
> album
> (cost=0.00..12922.72 rows=425772 width=4)
> -> Hash
> (cost=127874.30..127874.30 rows=30908 width=4)
> ->
> HashAggregate (cost=126947.06..127565.22 rows=30908 width=4)
> Filter: ((count(*) >= 15) AND (count(*) <= 25))
> ->
> Seq
> Scan
> on
> albumjoin (cost=0.00..88642.18 rows=5107318 width=4)
>
>
> I've tried adding a length(public.track.name) index and filtering
> public.track to those rows where length(name) is within a few characters
> of node.file, but that actually makes the plan more expensive.
>
> Is there any hope to make things much cheaper? Unfortunately, I can't
> filter out anything from the album or albumjoin tables.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-09-29 15:55:36 | Re: archive wal's failure and load increase. |
Previous Message | Simon Riggs | 2006-09-29 14:53:40 | Re: archive wal's failure and load increase. |