Quick Links

Re: Join of small table with large table

From:	"Rod Taylor" <rbt(at)zort(dot)ca>
To:	"large scale" <largescale_1999(at)yahoo(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Join of small table with large table
Date:	2002-05-13 14:11:21
Message-ID:	044b01c1fa88$0f4c4f00$0f02000a@jester
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

The big problem with the query probably isn't the scans due to your
dataset and the way indexes work.
I'm actually rather surprised it chose an index in the smaller table.

It's the agregate thats taking the time. Which means, faster CPU or
simpler aggregate will do the trick. Ie. Do you really need that
DISTINCT part?

--
Rod
----- Original Message -----
From: "large scale" <largescale_1999(at)yahoo(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Sent: Friday, May 10, 2002 2:04 PM
Subject: [HACKERS] Join of small table with large table

> Hi,
>
> I have two tables, one has 25000 rows and the other
> has 6.5 million rows.
>
> (25000 rows)
> table1
> (id text,
> start int,
> stop int)
>
> with seperate index on three individual fiels.
>
> 6.5 million rows
> table2
> (id text,
> start int,
> stop int)
>
> with seperate index on three individual fields.
>
> When I query this two table and try to find overlaped
> records, I have used this query:
>
>
**********************************************************************
****************************
> select count(distinct(table1.id))
> from table1, table2
> where table1.id=table2.id
> and ( (table2.start>=table1.start and table2.start <=
> table1.stop)
> or
> (table2.start <= table1.start and table1.start <=
> table2.stop) );
>
**********************************************************************
*****************************
>
> when I do a explain, I got this back:
>
>
**********************************************************************
**************************
> Aggregate (cost=353859488.21..353859488.21 rows=1
> width=78)
> -> Merge Join (cost=1714676.02..351297983.38
> rows=1024601931 width=78)
> -> Index Scan using genescript_genomseqid on
> genescript (cost=0.00..750.35 rows=25115 width=62)
> -> Sort (cost=1714676.02..1714676.02
> rows=6801733 width=16)
> -> Seq Scan on mouseblathuman
> (cost=0.00..153685.33 rows=6801733 width=16)
>
> EXPLAIN
>
**********************************************************************
***************************
>
> My question is: 1) Why the query start a seq scan on
> a much bigger table from beginning? I think it should
> start to scan the first table and use index for the
> bigger table.
> 2) The query itself takes
> forever, is there a way to speed up it?
> 3) Does this has anything to
> do with query planner?
>
> This is kind of a urgent project, so your prompt help
> is greatly appreciated. Thanks.
>
> Jim
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Shopping - Mother's Day is May 12th!
> http://shopping.yahoo.com
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

In response to

Join of small table with large table at 2002-05-10 18:04:17 from large scale

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rod Taylor	2002-05-13 14:17:57	Re: Join of small table with large table
Previous Message	Tom Lane	2002-05-13 14:04:50	Re: strange explain