Re: help with query

From: Brad Bulger <brad(at)madfish(dot)com>
To: pg(at)fastcrypt(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: help with query
Date: 2004-08-19 13:38:48
Message-ID: 4124AD68.8000208@madfish.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You're doing a join except not, is the trouble, looks like. The query is really
"FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join
to the Links table. So you end up getting every row in Links for each row in
Tickets with id = 17417.

I'd think this wants to be two queries or a union:

SELECT distinct main.oid,main.* FROM Tickets main
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (main.id = '17417'))
union
SELECT distinct main.oid,main.* FROM Tickets main, Links
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (Links.Type = 'MemberOf') )
AND ( (Links.LocalTarget = '17417') )
AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) )
;

or else, yah, a subquery:

[...]
AND (
main.id = '17417'
or
exists(
select true from Links
where Type = 'MemberOf' and LocalTarget = '17417'
and (LocalBase = main.id or LocalTarget = main.id)
)
)

Those are the only things I can think of to make it work, anyways.

Dave Cramer wrote:

> RT uses a query like:
>
> SELECT distinct main.oid,main.* FROM Tickets main
> WHERE
> (main.EffectiveId = main.id)
> AND
> (main.Status != 'deleted')
> AND
> ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND
> ( (main.Queue = '9') )
> AND ((
> ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) )
> OR
> ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) )
> or
> (main.id = '17417')
> )
> );
>
>
> which produces a query plan:
>
> Nested Loop (cost=0.00..813.88 rows=1 width=169)
> Join Filter: (((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id
> = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner"
> .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR (
> "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca
> lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)))
> -> Index Scan using tickets1 on tickets main (cost=0.00..657.61 rows=1 width=169)
> Index Cond: (queue = 9)
> Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND ((("type")::text = 'ticket'::text) OR (("type")::text = 'subticket'::text)))
> -> Seq Scan on links (cost=0.00..46.62 rows=1462 width=20)
>
> If I rewrite the query as:
>
> SELECT main.* FROM Tickets main
> WHERE
> (main.EffectiveId = main.id)
> AND
> (main.Status != 'deleted')
> AND
> ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND
> ( (main.Queue = '9') )
> AND (
> 17417 in (select links.localtarget from links where links.type='MemberOf' and main.id=links.localbase)
> or
> 17417 in ( select links.localbase from links where links.type='MemberOf' and main.id=links.localtarget)
> or
> main.id = '17417'
> )
> ;
>
> The time for the query goes from 1500ms to 15ms. The two OR clauses
>
> ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) )
> OR
> ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) )
>
> don't contribute to the result set in this particular dataset, which is why the speed increases so dramatically.
>
> Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical,
> and subqueries are not easily embraced.
>
> Dave

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2004-08-19 13:50:38 Re: help with query
Previous Message Dave Cramer 2004-08-19 13:17:13 help with query