Re: Finding uniques across a big join

From: "John D(dot) Burger" <john(at)mitre(dot)org>
To: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Finding uniques across a big join
Date: 2005-11-30 22:27:04
Message-ID: 254e417bc5cb3bca0441a79c8cc5c587@mitre.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Scott Marlowe wrote:

> OK, let's assume that the basic part of it, before the group by, has
> been put into a view, so we can then do:
>
> select pkey1, field2, field3, field4 from view;
>
> And we know that pkey1 is unique, but we want the records where pkey1
> is
> the only thing different between them, right?

Hmm, I'm explaining this really badly :). I should have defined a view
like you suggest to help simplify it. What I want is the pkeys (and
the field values) where no other pkey has that triple of field values.
That's why my earlier query does a group by the fields and then having
count(*) = 1. Also, FWIW, pkey1 is unique in its original table, but
not in the view, since some of the other tables are one-to-many.

> select
> v1.pkey1,
> v1.field2,
> v1.field3,
> v1.field4,
> v2.pkey1,
> v2.field2,
> v2.field3,
> v2.field4,
> from
> view v1
> join
> view v2
> on (
> v1.field2=v2.field2 and
> v1.field3=v2.field3 and
> v1.field3=v2.field3 and
> v1.pkey1<>v2.pkey
> )
>
> How does that work?

Won't this be a massive cross product of all pkey pairs that have the
same field values?

Here's what I'm currently using, in terms of your very helpful view:

select v1.pkey1, v1.field2, v1.field3, v1.field4
from view as v1
join
(select v2.field1, v2.field2, v2.field3
from view as v2
group by v2.field2, v2.field3, v2.field4
having count(*) = 1)
using (field2, field3, field4);

This is the one that takes eight hours. :( Another way to express what
I want is this:

select v1.pkey1, v1.field2, v1.field3, v1.field4
from view as v1
where not exists
(select true from view as v2
where v1.field2 = v2.field2
and v1.field3 = v2.field3
and v1.field4 = v2.field4
and v1.pkey1 <> v2.pkey1);

That looks like a horrible nested loop, but I suppose I should try it
to make sure it is indeed slower then the previous query.

- John Burger
MITRE

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tyler MacDonald 2005-11-30 22:34:11 Re: undefined behaviour for sub-transactions?
Previous Message Jerry Sievers 2005-11-30 22:12:36 Re: How to check options PostgreSQL was started with