Re: Help with sql

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with sql
Date: 2012-07-07 00:43:32
Message-ID: 4FF78634.9080800@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/06/2012 03:34 PM, Perry Smith wrote:
> Hi Guys,
>
> This isn't a PostgreSQL specific question but just a SQL question. If this is not an appropriate question for this list, please let me know.
>
> It is also, perhaps, a really silly question.
>
> This query (without the 'explain' keyword) , when executed takes forever and a day:
>
>> condor_development=> explain select id from filesets where id not in ( select fileset_id from service_pack_fileset_maps );
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------
>> Seq Scan on filesets (cost=0.00..71937742.00 rows=26088 width=4)
>> Filter: (NOT (SubPlan 1))
>> SubPlan 1
>> -> Materialize (cost=0.00..2517.78 rows=95852 width=4)
>> -> Seq Scan on service_pack_fileset_maps (cost=0.00..1663.52 rows=95852 width=4)
>> (5 rows)
>
> This query returns within a second:
>
>> condor_development=> explain select id from filesets where id not in ( select distinct fileset_id from service_pack_fileset_maps );
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------
>> Seq Scan on filesets (cost=2102.31..3153.53 rows=26088 width=4)
>> Filter: (NOT (hashed SubPlan 1))
>> SubPlan 1
>> -> HashAggregate (cost=1903.15..2062.48 rows=15933 width=4)
>> -> Seq Scan on service_pack_fileset_maps (cost=0.00..1663.52 rows=95852 width=4)
>> (5 rows)
>
> The difference is the "distinct" keyword in the inner select.
>
> What I'm confused about is why isn't the "distinct" implicit? I thought the construct "blah in ( select ... )" was using "sets" and an item (I thought) can not be in a set more than once.
>
> Perhaps my question is the opposite really? Why would you not always use "distinct" in the inner select when the operator is "in" or "not in" ?
>
> And if I can throw in another question on top: is there a different method other than "not in" that would work better?
>
> Thank you guys for the help and a really awesome database.
> pedz
> (this is *suppose* to be sent as plain text... I hope my mailer does what it is told)
>
>

Well they are distinct records, they just may have the same values. And
I'm not trying to be flippant. We don't see the structure of those
table: are all the id fields involved primary keys or with unique index
coverage? Does "not exists ( select fileset.id = fileset_id from
service_pack_fileset_map) change the behaviour?

In response to

Browse pgsql-general by date

  From Date Subject
Next Message ach 2012-07-07 00:51:37 Re: index and data tablespaces on two separate drives or one RAID 0?
Previous Message Steven Schlansker 2012-07-06 23:25:05 Re: Suboptimal query plan fixed by replacing OR with UNION