Re: Joining 2 tables with 300 million rows

From: Amit V Shah <ashah(at)tagaudit(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Joining 2 tables with 300 million rows
Date: 2005-12-08 22:01:01
Message-ID: 0C072E7CC947D511AC9600A0CC7341200256D14C@xeon400.tagaudit.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

The thing is, although it shows 0.15 seconds, when I run the actual query,
it takes around 40-45 seconds (sorry I forgot to mention that). And then
sometimes it depends on data. Some parameters have very less number of
records, and others have lot more. I dont know how to read the "explan"
results very well, but looked like there were no sequential scans and it
only used indexes.

Also, another problem is, the second time I run this query, it returns it
from cache I believe. So the second time I run it, it returns in like 2
seconds or
so !

Thats why I was worrying if joining 2 tables like that is even advisable at
all ...

Thanks,
Amit

-----Original Message-----
From: Dmitri Bichko [mailto:dbichko(at)aveopharma(dot)com]
Sent: Thursday, December 08, 2005 1:47 PM
To: Amit V Shah; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Joining 2 tables with 300 million rows

What's the problem? You are joining two 300 million row tables in 0.15
of a second - seems reasonable.

Dmitri

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
> Amit V Shah
> Sent: Thursday, December 08, 2005 11:59 AM
> To: 'pgsql-performance(at)postgresql(dot)org'
> Subject: [PERFORM] Joining 2 tables with 300 million rows
>
>
> Hi all,
>
> First of all, please pardon if the question is dumb! Is it
> even feasible or normal to do such a thing ! This query is
> needed by a webpage so needs to be lightning fast. Anything
> beyond 2-3 seconds is unacceptable performance.
>
> I have two tables
>
> CREATE TABLE runresult
> (
> id_runresult int8 NOT NULL,
> rundefinition_id_rundefinition int4 NOT NULL,
> measure_id_measure int4 NOT NULL,
> value float4 NOT NULL,
> "sequence" varchar(20) NOT NULL,
> CONSTRAINT pk_runresult_ars PRIMARY KEY (id_runresult),
> )
>
>
> CREATE TABLE runresult_has_catalogtable
> (
> runresult_id_runresult int8 NOT NULL,
> catalogtable_id_catalogtable int4 NOT NULL,
> value int4 NOT NULL,
> CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY
> (runresult_id_runresult, catalogtable_id_catalogtable, value)
> CONSTRAINT fk_temp FOREIGN KEY (runresult_id_runresult) REFERENCES
> runresult(id_runresult) ON UPDATE RESTRICT ON DELETE RESTRICT
> )
>
> Each table has around 300 million records (will grow to
> probably billions). Below is the query and the explain analyze --
>
> explain analyze SELECT measure.description, runresult.value
> FROM ((((rundefinition INNER JOIN runresult ON
> rundefinition.id_rundefinition =
> runresult.rundefinition_id_rundefinition)
> INNER JOIN runresult_has_catalogtable ON runresult.id_runresult =
> runresult_has_catalogtable.runresult_id_runresult)
> INNER JOIN runresult_has_catalogtable AS
> runresult_has_catalogtable_1 ON runresult.id_runresult =
> runresult_has_catalogtable_1.runresult_id_runresult)
> INNER JOIN runresult_has_catalogtable AS
> runresult_has_catalogtable_2 ON runresult.id_runresult =
> runresult_has_catalogtable_2.runresult_id_runresult)
> INNER JOIN measure ON runresult.measure_id_measure =
> measure.id_measure WHERE
> (((runresult_has_catalogtable.catalogtable_id_catalogtable)=52)
> AND ((runresult_has_catalogtable_1.catalogtable_id_catalogtable)=54)
> AND ((runresult_has_catalogtable_2.catalogtable_id_catalogtable)=55)
> AND ((runresult_has_catalogtable.value)=15806)
> AND ((runresult_has_catalogtable_1.value)=1)
> AND ((runresult_has_catalogtable_2.value) In (21,22,23,24))
> AND ((rundefinition.id_rundefinition)=10106));
>
> 'Nested Loop (cost=0.00..622582.70 rows=1 width=28) (actual
> time=25.221..150.563 rows=22 loops=1)' ' -> Nested Loop
> (cost=0.00..622422.24 rows=2 width=52) (actual
> time=25.201..150.177 rows=22 loops=1)'
> ' -> Nested Loop (cost=0.00..622415.97 rows=2
> width=32) (actual
> time=25.106..149.768 rows=22 loops=1)'
> ' -> Nested Loop (cost=0.00..621258.54 rows=15
> width=24)
> (actual time=24.582..149.061 rows=30 loops=1)'
> ' -> Index Scan using pk_rundefinition on
> rundefinition
> (cost=0.00..3.86 rows=1 width=4) (actual time=0.125..0.147
> rows=1 loops=1)'
> ' Index Cond: (id_rundefinition = 10106)'
> ' -> Nested Loop (cost=0.00..621254.54 rows=15
> width=28) (actual time=24.443..148.784 rows=30 loops=1)'
> ' -> Index Scan using
> runresult_has_catalogtable_value on
> runresult_has_catalogtable (cost=0.00..575069.35 rows=14437
> width=8) (actual time=0.791..33.036 rows=10402 loops=1)'
> ' Index Cond: (value = 15806)'
> ' Filter:
> (catalogtable_id_catalogtable =
> 52)'
> ' -> Index Scan using pk_runresult_ars on
> runresult (cost=0.00..3.19 rows=1 width=20) (actual
> time=0.007..0.007 rows=0 loops=10402)'
> ' Index Cond: (runresult.id_runresult =
> "outer".runresult_id_runresult)'
> ' Filter: (10106 =
> rundefinition_id_rundefinition)'
> ' -> Index Scan using
> runresult_has_catalogtable_id_runresult
> on runresult_has_catalogtable runresult_has_catalogtable_1
> (cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017
> rows=1 loops=30)'
> ' Index Cond:
> (runresult_has_catalogtable_1.runresult_id_runresult =
> "outer".runresult_id_runresult)'
> ' Filter: ((catalogtable_id_catalogtable =
> 54) AND (value
> = 1))'
> ' -> Index Scan using pk_measure on measure
> (cost=0.00..3.12 rows=1
> width=28) (actual time=0.008..0.010 rows=1 loops=22)'
> ' Index Cond: ("outer".measure_id_measure =
> measure.id_measure)'
> ' -> Index Scan using
> runresult_has_catalogtable_id_runresult on
> runresult_has_catalogtable runresult_has_catalogtable_2
> (cost=0.00..79.42 rows=65 width=8) (actual time=0.007..0.010
> rows=1 loops=22)'
> ' Index Cond:
> (runresult_has_catalogtable_2.runresult_id_runresult =
> "outer".runresult_id_runresult)'
> ' Filter: ((catalogtable_id_catalogtable = 55) AND
> ((value = 21) OR
> (value = 22) OR (value = 23) OR (value = 24)))'
> 'Total runtime: 150.863 ms'
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rory Campbell-Lange 2005-12-08 22:56:20 Re: Disk tests for a new database server
Previous Message Jeffrey W. Baker 2005-12-08 19:21:48 Re: opinion on disk speed