Re: Join question

From: Richard Broersma Jr <rabroersma(at)yahoo(dot)com>
To: tyrrill_ed(at)emc(dot)com, pgsql-sql(at)postgresql(dot)org
Subject: Re: Join question
Date: 2007-08-21 19:16:40
Message-ID: 73073.2010.qm@web31805.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

--- tyrrill_ed(at)emc(dot)com wrote:

> create table c (
> a_id int,
> b_id int
> );
>
> I am doing a query like this:
>
> SELECT a.x, max(b.x) FROM a, b, c WHERE a.a_id = c.a_id AND b.b_id =
> c.b_id GROUP by a.x;
>
> I only need to get one row from b for each row in a, and it really
> doesn't matter which one. I use max() to get a single value from table
> b. There are generally be dozens to hundreds of rows in b for each row
> in a. The problem is when I have a query with tens of thousands of rows
> in a that the join with b will have millions of rows, and is really
> slow. The group by effectively reduces the results down to what I want,
> but it still has to process the millions of rows. Does anyone know a
> way I could restructure this query to get only one b for each a in a
> faster way?

CREATE INDEX table_c_foreign_key ON c ( a, b );

SELECT a.x, b.x
FROM ( SELECT DISTINCT( a_id ) a_id, b_id
FROM c ) AS c( a_id, b_id )
INNER JOIN a
ON c.a_id = a.id
INNER JOIN b
ON c.b_id = b.id;

Regards,
Richard Broersma Jr.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Richard Broersma Jr 2007-08-21 19:18:52 Re: Join question
Previous Message tyrrill_ed 2007-08-21 17:48:18 Join question