Re: problem (bug?) with "in (subquery)"

From: Luca Pireddu <lucap(at)shaw(dot)ca>
To: pgsql-sql(at)postgresql(dot)org
Cc: Michael Fuhr <mike(at)fuhr(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: problem (bug?) with "in (subquery)"
Date: 2005-07-15 15:52:01
Message-ID: 200507150952.01732.lucap@shaw.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On July 15, 2005 07:34, Michael Fuhr wrote:
> On Thu, Jul 14, 2005 at 01:34:21AM -0600, Luca Pireddu wrote:
> > I have the following query that isn't behaving like I would expect:

Thanks for creating the reduced test case Michael. My apologies for not doing
it myself.

> >
> > select * from strains s where s.id in (select strain_id from
pathway_strains);
>
> Any reason the subquery isn't doing "SELECT DISTINCT strain_id"?

because I don't need to according to the specification of "in". However, it
does generate the correct output. So does

select distinct * from strains s where s.id in (select strain_id from
pathway_strains);

>
> > I would expect each strain record to appear only once. Instead I get
output
> > like this, where the same strain id appears many times:
> >
> > id | name | organism
> > -------+--------------+----------
> > 83 | common | 82
> > 83 | common | 82
> > 83 | common | 82
>
> What happens when you try each of the following? Do they give the
> expected results? I did some tests and I'm wondering if the planner's
> hash join is responsible for the duplicate rows.
>
> SELECT * FROM strains WHERE id IN (
> SELECT strain_id FROM pathway_strains ORDER BY strain_id
> );

With the "order by" it works as it should, not generating duplicate rows.

>
> CREATE TEMPORARY TABLE foo AS SELECT strain_id FROM pathway_strains;
> SELECT * FROM strains WHERE id IN (SELECT strain_id FROM foo);

This one's interesting. It only returns the unique rows.

>
> SET enable_hashjoin TO off;
> SELECT * FROM strains WHERE id IN (SELECT strain_id FROM pathway_strains);

With hashjoin off the query returns the correct output.

On July 15, 2005 08:58, Tom Lane wrote:
> Ah-hah: this one is the fault of create_unique_path, which quoth
>
> /*
> * If the input is a subquery whose output must be unique already, we
> * don't need to do anything.
> */
>
> Of course, that needs to read "... unique already, *and we are using all
> of its output columns in our DISTINCT list*, we don't need to do
> anything."
>
> regards, tom lane

In any case, it looks like Tom has already found the problem :-) Thanks guys!

Luca

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Dianne Yumul 2005-07-15 16:54:18 Re: Postgres for Fedora Core 2 OS ****************
Previous Message Tom Lane 2005-07-15 14:58:25 Re: problem (bug?) with "in (subquery)"