Re: COPY command details

From: Benjamin Arai <benjamin(at)araisoft(dot)com>
To: Gerald Timothy G Quimpo <gerald(dot)quimpo(at)qualservcentral(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: COPY command details
Date: 2007-03-30 05:15:20
Message-ID: 460C9CE8.7070408@araisoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have one system which I have used partitioning. For this particular
case I have tons of data over about (50 years). What I did is wrote
small loader that breaks data in tables based on date, so I have tables
like abc_2000, abc_2001 etc. The loading script is only a couple
hundred lines of code. The only part that was a little bit of work was
to allow for easy access to the data for the data for the devs. I did
this by writing a few PL functions to automatically union the tables
and produce results. So the function like getData(startData,enddate)
would run a union query for the respective date ranges.

Benjamin

P.S. If I am doing anything that can be improved for the data access
portion please let me know and feel free to voice your opinions. I am
always looking for new ways to make this particular database faster.

Gerald Timothy G Quimpo wrote:
> On Thu, 2007-03-29 at 21:30 -0700, Benjamin Arai wrote:
>
>> Rebuilding an index can't be the PostgreSQL solution for all
>> cases. I am dealing with databases in the hundreds of gigs
>> range and I am adding about 10gigs of data a week. At
>> some point its going to take longer than a week to rebuild
>> all of the indexes in the database.
>>
>> On the other hand, if I am to partition the data into
>> several tables then it might not be such a big deal since
>> I am only adding and never deleting... This makes it a
>> little more of a pain in the ass.
>>
>
> I am leaning toward a policy of always partitioning large
> tables. I haven't found the time to do it properly yet,
> thinking about it, hoping that someone who'se done it will
> chime in with their rules of thumb.
>
> Like Benjamin, I have a database that is close to 600GB
> for 2.25 years of data, and if I were to add the other
> 4 years of data that we have archived away, will easily go
> into the terabyte range. There are a few individual tables
> which approach 100GB all by themselves.
>
> As it is, I can't afford to do reindex or even backup
> (pg_dump or any other method) or other administrative tasks
> on those tables since the processes take too long (there are
> workarounds, i could backup single tables at slack times,
> which would allow me to do a complete backup (but not
> self-consistent as a set) over the course of a week or so.
>
> So I'm leaning toward partitioning, perhaps selecting
> partition rules so that no table will be larger than
> around 5GB, at which point, reindex or admin procedures
> that take exclusive locks now become only minor
> inconveniences rather than showstoppers.
>
> How do people take consistent backups of very large
> databases on Linux/FreeBSD? I'm aware of PITR, but
> might not be able to set aside a box with enough
> drives for it. LVM Snapshot? performance issues with
> LVM, etc?
>
> tiger
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tiger Quimpo 2007-03-30 05:58:01 Re: COPY command details
Previous Message Gerald Timothy G Quimpo 2007-03-30 04:56:23 Re: COPY command details