Re: Advice for optimizing queries using Large Tables

From: Francisco Reyes <lists(at)natserv(dot)com>
To: Shaun Grannis <shaun_grannis(at)hotmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Advice for optimizing queries using Large Tables
Date: 2002-03-10 18:30:45
Message-ID: 20020310131915.O42725-100000@zoraida.natserv.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 9 Mar 2002, Shaun Grannis wrote:

> I'm working with a table containing over 65 million records in Postgres v
> 7.1.3. The machine is a dual-processor Athlon MP1900 (Tyan Tiger board) with
> 3GB of PC2100 DDR RAM, and 3-80GB IBM 120GXP hard drives configured in a
> software RAID 0 Array running under RedHat Linux v. 7.2. Queries don't seem
> to be running as fast as "they should".

Have you considered moving to a SCSI setup?

> SELECT count(*) FROM table WHERE value=1999;
> takes approximately 45 minutes to execute, and returns a count of approx 2.2
> million records. My "instinct" is that this is much too slow for a query of
> an indexed column running on this hardware.

As it was suggested you may want to consider going to 7.2

> Here's the table schema:
> Table "table"
>
> Attribute | Type | Modifier
> -----------+---------------+----------
> col01 | character(9) |
> col02 | character(15) |
> col03 | character(15) |
> col04 | character(1) |
> col05 | character(15) |
> col06 | character(15) |
> col07 | character(15) |
> col08 | character(15) |
> col09 | character(1) |
> col10 | integer |
> col11 | integer |
> col12 | integer |
> col13 | integer |
> col14 | integer |
> value | integer |

> Does anyone have any advice for optimizing the SELECT query listed above?

Another "optimization"/trick we do here is to split tables into the most
needed info and the least needed info. If you have a part of the data
which is used often you put it in the "main" table. If you have data which
is big and not used often you put it in a second table. In our case we
have a few tables where the size of the data which is not used often can
be from 2 to 4 times the size of the data used often. This helps a lot
with joins and sequential scans.

>Is this as fast as Postgresql will perform? Any good pointers on working
>with large tables in Postgres?

I would suspect this is not as fast as PostgreSQL can perform. Although my
data set is not 60+ million records to do a somewhat complex aggregate +
join of 5+ million records takes on the 30 minutes range.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Reyes 2002-03-10 18:42:26 Re: How to check for successfull inserts
Previous Message Dave 2002-03-10 18:29:37 Re: Postgres not starting at boot(FreeBSD) - startup script not releasing