From: | Amitabh Kant <amitabhkant(at)gmail(dot)com> |
---|---|
To: | nair rajiv <nair331(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: splitting data into multiple tables |
Date: | 2010-01-25 17:33:23 |
Message-ID: | 84b68b3d1001250933s741f6565t4e478b136a1fff84@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, Jan 25, 2010 at 10:53 PM, nair rajiv <nair331(at)gmail(dot)com> wrote:
> Hello,
>
> I am working on a project that will take out structured content
> from wikipedia
> and put it in our database. Before putting the data into the database I
> wrote a script to
> find out the number of rows every table would be having after the data is
> in and I found
> there is a table which will approximately have 5 crore entries after data
> harvesting.
> Is it advisable to keep so much data in one table ?
> I have read about 'partitioning' a table. An other idea I have is
> to break the table into
> different tables after the no of rows in a table has reached a certain
> limit say 10 lacs.
> For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b'
> each having 10 lac entries.
> I needed advice on whether I should go for partitioning or the approach I
> have thought of.
> We have a HP server with 32GB ram,16 processors. The storage has
> 24TB diskspace (1TB/HD).
> We have put them on RAID-5. It will be great if we could know the
> parameters that can be changed in the
> postgres configuration file so that the database makes maximum utilization
> of the server we have.
> For eg parameters that would increase the speed of inserts and selects.
>
>
> Thank you in advance
> Rajiv Nair
We have several servers that regularly run into records exceeding 50 million
records on dual quad core machine with 8 GB RAM and 4 SAS 15K hard disks in
RAID 10. If 50 million is the max amount of records that you are looking
at, I would suggest not breaking the table. Rather, configure the database
settings present in postgresql.conf file to handle such loads.
You already have a powerful machine (I assume it's 16 core, not 16 physical
processors), and if configured well, I hope would present no problems in
accessing those records. For tuning PostgreSql, you can take a look at
pgtune (http://pgfoundry.org/projects/pgtune/) .
Two changes that I can suggest in your hardware would be to go in for SAS
15K disks instead of SATA if you can do with less capacity, and goign in for
RAID 10 instead of RAID 5.
Regards
Amitabh Kant
From | Date | Subject | |
---|---|---|---|
Next Message | Viji V Nair | 2010-01-25 17:39:17 | Re: splitting data into multiple tables |
Previous Message | nair rajiv | 2010-01-25 17:23:41 | splitting data into multiple tables |