From: | Hannu Krosing <hannu(at)tm(dot)ee> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: What needs to be done for real Partitioning? |
Date: | 2005-03-21 16:32:53 |
Message-ID: | 1111422773.4675.33.camel@fuji.krosing.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On L, 2005-03-19 at 12:02 -0800, Josh Berkus wrote:
> Folks,
>
> I may (or may not) soon have funding for implementing full table partitioning
> in PostgreSQL.
If you don't get it, contact me as there is a small possibility that I
know a company interested enough to fund (some) of it :)
> I thought it would be a good idea to discuss with people here
> who are already using pseudo-partitioning what things need to be added to
> Postgresql in order to make full paritioning a reality; that is, what do
> other databases do that we don't?
As these are already discussed in this thread, I'll try to outline a
method of providing a global index (unique or not) in a way that will
still make it possible to quickly remove (and not-quite-so-quickly add)
a partition.
The structure is inspired by the current way of handling >1Gb tables.
As each tid consists of 32 bit page pointer we have pointerspace of
35184372088832 bytes/index (4G of 8k pages). currently this is directly
partitioned mapped to 1Gbyte/128kpage files, but we can, with minimal
changes to indexes, put a lookup table between index and page lookup.
In case of global index over partitions this table could point to 1G
subtables from different partition tables.
The drop partition table can also be fast - just record the pages in
lookup table as deleted - means one change per 1G of dropped table.
The next vacuum should free pointers to deleted subfiles.
Adding partitions is trickier -
If the added table forms part of partitioning index (say names from C to
E), and there is a matching index on subtable,
Then that part of btree can probably copied into the main btree index as
a tree btanch, which should be relatively fast (compared to building it
one tid at a time).
Else adding the the index could probably also be sped up by some kind of
index merge - faster than building from scratch but slower than above.
To repeat - the global index over partitioned table should have te same
structure as our current b-tree index, only with added map of 128k index
partitions to 1G subfiles of (possibly different) tables. This map will
be quite small - for 1Tb of data it will be only 1k entries - this will
fit in cache on all modern processors and thus should add only tiny
slowdown from current direct tid.page/128k method
--
Hannu Krosing <hannu(at)tm(dot)ee>
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2005-03-21 17:51:06 | Re: What about utility to calculate planner cost constants? |
Previous Message | Tom Lane | 2005-03-21 16:30:49 | Re: column name is "LIMIT" |