Re: pgsql and large tables

From: Francisco Reyes <lists(at)natserv(dot)com>
To: Gurupartap Davis <partap(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pgsql and large tables
Date: 2001-11-15 17:07:56
Message-ID: 20011115115301.C60187-100000@zoraida.natserv.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 13 Nov 2001, Gurupartap Davis wrote:

I am somewhat new to PostgreSQL, but didn't see a reply to your answer so
I would tell you what I know about your queries.

> I've got a large database, currently about 40 million rows in the biggest table.
...

> I initially created the table and with a primary key (5 fields: char(4), varchar(32), date, int, int)

Out of curiosity why are you using this as your primary key and not a
serial? Will your tables be normalized?

>and a couple additional indexes, one that just changes the order of the
>primary key, and one that sorts on the date.

From whawt I have read so far your primary key doesn't sound like what you
would want to have as a primary key... but I guess the design of your
table is a totally different issue.

> Then I imported 40 million rows and tried some queries, but discovered
>that none of my queries were using indexes, causing them to take forever.

It is difficult for anyone to reply to generic questions. How about
showing us the query, an explain plan, the index attributes. Any of the
index based on a function?

> So I read somewhere in the archives thatyou need to VACUUM a table regularly
>for indexes to work properly.

Sort of. Let's say they work better. It has to do with what the optimizer
believes is your data distribution to determine when to use the index.

> I tried that, but aborted after about 5 hours.

You have told us nothing about your hardware, your OS, which version of
PostgreSQL (unless I missed it)

> I can't use pgsql if I have to take the db down for more than 10-15 minutes a day.

Then maybe you may not be able to use it right now. I believe that the
current "vacuum analyse" and maybe vacuum too may need to lock the table
while they are run. Hopefully others will explain this better. This will
also change on the soon to be release 7.2.

>Then I read somewhere else that you should drop your indexes before VACUUMing
>and re-create them afterwards.

That makes little sense to me. Also don't recall reading this. What I do
recall is that before a bulk load you want to drop your indexes. My
understanding was that you wanted to have your indexes when you do vacuum
analyze.

>I tried that, and VACUUM finished in about 10 minutes.
>I've been trying to recreate my primary key for the last
>18 hours...not so good.

Again, we need more info about your hardware, OS, versino of pgsql, etc...

> Should I have dropped all indexes *except* for the primary?

My understanding is that you may want to drop your data when doing a big
load, not when doing a vacuum.

>Should I make an artificial primary key with a serial type to simplify things?

I recommend you do this. Not only because it is more efficient, but
because the type of key you selected has "issues" for lack of a better
term. You can of course have a "unique" index so you don't have dups.

I think that when you are dealing with a 40 million table you need to
consider your hardware. You also didn't tell us how big are the rows.
The more info you give us the more others will be able to help.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Ford 2001-11-15 17:12:19 Re: UPDATE w/ subselect doing locking
Previous Message satya pariki 2001-11-15 16:59:58