Quick Links

Re: Why is this query running slowly?

From:	"Tomas Vondra" <tv(at)fuzzy(dot)cz>
To:	"Yang Zhang" <yanghatespam(at)gmail(dot)com>
Cc:	"Tomas Vondra" <tv(at)fuzzy(dot)cz>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Why is this query running slowly?
Date:	2011-09-15 09:29:30
Message-ID:	d344513f086a96bd4149fad0c1d6839c.squirrel@sq.gransy.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 15 Září 2011, 11:07, Yang Zhang wrote:
> On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> On 15 Září 2011, 9:53, Yang Zhang wrote:
>>> I have a simple query that's been running for a while, which is fine,
>>> but it seems to be running very slowly, which is a problem:
>>>
>>> mydb=# explain select user_id from den where user_id not in (select
>>> duid from user_mappings) and timestamp between '2009-04-01' and
>>> '2010-04-01';
>>>
>>> QUERY PLAN
>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4)
>>> Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without
>>> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp
>>> without time zone) AND (NOT (SubPlan 1)))
>>> SubPlan 1
>>> -> Materialize (cost=711.58..1223.38 rows=36780 width=4)
>>> -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780
>>> width=4)
>>>
>>> user_mappings is fairly small:
>>
>> The problem is that for each of the 22634720 rows in "den" a separate
>> uncorrelated subquery (a seq scan on user_mappings) has to be executed.
>> Althogh the subquery is not very expensive, multiplied by the number of
>> rows in "den" the total cost is extreme.
>>
>> The only solution is to get rid of the "not in" subquery - try to turn
>> it
>> to a join like this:
>>
>> SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
>> WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
>> AND (duid IS NULL)
>>
>> That should give the same result I guess.
>
> This worked great, thank you. Too bad the planner isn't smart enough
> to do this yet!

It's probably a bit more complicated I guess - there are probably cases
when this would not work this great.

Anyway, you should consider Toby Corkindale's recommendation and check if
there's an index on that timestamp column - that might further improve the
performance. (I don't think an index on user_id might be useful in this
case).

Tomas

In response to

Re: Why is this query running slowly? at 2011-09-15 09:07:49 from Yang Zhang

Browse pgsql-general by date

	From	Date	Subject
Next Message	Giorgio Valoti	2011-09-15 10:18:26	Re: Jenkins
Previous Message	Yang Zhang	2011-09-15 09:07:49	Re: Why is this query running slowly?