Re: Partitioning of a dependent table not based on date

From: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Partitioning of a dependent table not based on date
Date: 2014-12-02 10:26:48
Message-ID: 53AC2E6F-CC02-4EC7-B358-5DD54B6CCC5A@unicell.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 01/12/2014, at 19:26, Andy Colson wrote:

> On 12/1/2014 11:14 AM, Herouth Maoz wrote:
>> I am currently in the process of creating a huge archive database that
>> contains data from all of our systems, going back for almost a decade.
>>
>> Most of the tables fall into one of two categories:
>>
>> 1. Static tables, which are rarely updated, such as lookup tables or
>> user lists. I don't intend to partition these, I'll just refresh them
>> periodically from production.
>> 2. Transaction tables, that have a timestamp field, for which I have the
>> data archived in COPY format by month. Of course a monolithic table over
>> a decade is not feasible, so I am partitioning these by month.
>>
>> (I don't mean "transaction" in the database sense, but in the sense that
>> the data represents historical activity, e.g. message sent, file
>> downloaded etc.)
>>
>> I have one table, though, that doesn't fall into this pattern. It's a
>> many-to-one table relating to one of the transaction tables. So on one
>> hand, it doesn't have a time stamp field, and on the other hand, it has
>> accumulated lots of data over the last decade so I can't keep it
>> unpartitioned.
>>
>
> Lets stop here. One big table with lots of rows (and a good index) isn't a problem. As long as you are not table scanning everything, there isn't a reason to partition the table.
>
> Lots and lots of rows isnt a problem except for a few usage patterns:
> 1) delete from bigtable where (some huge percent of the rows)
> 2) select * from bigtable where (lots and lots of table scanning and cant really index)
>
> If your index is selective enough, you'll be fine.

Hmm. I suppose you're right. I planned the whole partition thing in the first place because most of my "transaction" tables are still alive so I'll need to continue bulk-inserting data every month, and inserting into a fresh partition is better than into a huge table.

But in this case, since we have stopped working on this application in January, there will be no fresh inserts so it's not as important. We just need the archive for legal purposes.

One thing, though: I noticed on my other system (a reports system, that holds a year's worth of data) that after I have partitioned the largest tables, backup time dropped. I suppose pg_dump of a single huge table takes is not as fast as pg_dump of multiple smaller ones.

Herouth

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tim Schäfer 2014-12-02 15:41:34 Auto vacuum not running -- Could not bind socket for statistics collector
Previous Message Albe Laurenz 2014-12-02 09:54:57 Re: Serialization exception : Who else was involved?