From: | nair rajiv <nair331(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: splitting data into multiple tables |
Date: | 2010-01-26 03:48:54 |
Message-ID: | d67ff5e61001251948i2d5faf28kdb1ae7dabd8de627@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On Tuesday 26 January 2010 01:39:48 nair rajiv wrote:
> > On Tue, Jan 26, 2010 at 1:01 AM, Craig James
> <craig_james(at)emolecules(dot)com>wrote:
> > I am working on a project that will take out structured content
> > from wikipedia
> > and put it in our database. Before putting the data into the database I
> > wrote a script to
> > find out the number of rows every table would be having after the data is
> > in and I found
> > there is a table which will approximately have 50,000,000 rows after data
> > harvesting.
> > Is it advisable to keep so much data in one table ?
> Depends on your access patterns. I.e. how many rows are you accessing at
> the
> same time - do those have some common locality and such.
>
I'll give a brief idea of how this table is. The important columns
are
subject, predicate and object. So given a predicate and object one should
be able to get all the subjects, given subject and a predicate one should
be able to retrieve all the objects. I have created an indexes on these
three
columns.
>
>
> > I have read about 'partitioning' a table. An other idea I have
> is
> > to break the table into
> > different tables after the no of rows in a table has reached a certain
> > limit say 10,00,000.
> > For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b'
> > each having 10,00,000 rows.
> > I needed advice on whether I should go for partitioning or the approach I
> > have thought of.
> Your approach is pretty close to partitioning - except that partitioning
> makes
> that mostly invisible to the outside so it is imho preferrable.
>
> > We have a HP server with 32GB ram,16 processors. The storage
> has
> > 24TB diskspace (1TB/HD).
> > We have put them on RAID-5. It will be great if we could know the
> > parameters that can be changed in the
> > postgres configuration file so that the database makes maximum
> utilization
> > of the server we have.
> > For eg parameters that would increase the speed of inserts and selects.
> Not using RAID-5 possibly would be a good start - many people (me included)
> experienced bad write performance on it. It depends a great deal on the
> controller/implementation though.
> RAID-10 is normally to be considered more advantageous despite its lower
> usable space ratio.
> Did you create one big RAID-5 out of all disks? Thats not a good idea,
> because
> its pretty likely that another disk fails while you restore a previously
> failed disk. Unfortunately in that configuration that means you have lost
> your
> complete data (in the most common implementations at least).
>
No, I am using only 12TB i.e 12 HDs of the 24TB I have
>
> Andres
>
> PS: Your lines are strangely wrapped...
>
From | Date | Subject | |
---|---|---|---|
Next Message | Viji V Nair | 2010-01-26 07:58:02 | Re: splitting data into multiple tables |
Previous Message | Andres Freund | 2010-01-26 00:49:09 | Re: splitting data into multiple tables |