Re: Performance tips

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andrew_perrin(at)unc(dot)edu
Cc: Justin Clift <justin(at)postgresql(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance tips
Date: 2002-01-10 18:44:25
Message-ID: 21703.1010688265@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Andrew Perrin <andrew_perrin(at)unc(dot)edu> writes:
> auth=# EXPLAIN select count(patternid) from patterns where patternid in
> (select
> auth(# o_patternid from letters, pattern_occurrences where
> letters.letterid =
> auth(# pattern_occurrences.o_letterid and letters.datecat in (1,2));
> NOTICE: QUERY PLAN:

> Aggregate (cost=10770432787318.88..10770432787318.88 rows=1 width=4)
> -> Seq Scan on patterns (cost=0.00..10770432756138.14 rows=12472297
> width=4)
> SubPlan
> -> Materialize (cost=863548.43..863548.43 rows=5749731
> width=12)
> -> Hash Join (cost=1741.00..863548.43 rows=5749731
> width=12)
> -> Seq Scan on pattern_occurrences
> (cost=0.00..250248.56 rows=15287556 width=8)
> -> Hash (cost=1729.67..1729.67 rows=4530 width=4)
> -> Seq Scan on letters (cost=0.00..1729.67
> rows=4530 width=4)

Well, it's materializing the subquery result, which is good, but are
there really going to be 5.7M rows in the subquery result? If so,
no wonder you're hurting: the IN is going to be scanning through that
result for each row from the outer query, until it either finds a match
or reaches the end. Can you reduce the size of the subquery result at
all? (If the subquery as written produces a lot of duplicate
o_patternids, then making it be a SELECT DISTINCT might help.)

The long-term answer is probably that you need to convert the IN to some
smarter form of join. One idea that comes to mind is

select count(patternid)
from patterns,
(select distinct o_patternid from letters, pattern_occurrences where
letters.letterid = pattern_occurrences.o_letterid
and letters.datecat in (1,2)) AS ss
where patternid = ss.o_patternid;

Given the "select distinct" to ensure there are no duplicates in the
subselect output, this should produce the same output as the original,
I think, and it would give the planner a shot at using a merge or hash
join to match up the pattern id values.

Oh, BTW: you might also try kicking up sort_mem if you didn't already.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mike Schroepfer 2002-01-10 18:53:09 Insert Performance with WAL and Fsync
Previous Message Andrew Perrin 2002-01-10 18:31:57 Re: Performance tips