From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Storage Model for Partitioning |
Date: | 2008-01-11 13:26:13 |
Message-ID: | 47876E75.3040001@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon Riggs wrote:
> On Fri, 2008-01-11 at 11:34 +0000, Richard Huxton wrote:
>
>> Is the following basically the same as option #3 (multiple RelFileNodes)?
>>
>> 1. Make an on-disk "chunk" much smaller (e.g. 64MB). Each chunk is a
>> contigous range of blocks.
>> 2. Make a table-partition (implied or explicit constraints) map to
>> multiple "chunks".
>> That would reduce fragmentation (you'd have on average 32MB's worth of
>> blocks wasted per partition) and allow for stretchy partitions at the
>> cost of an extra layer of indirection.
>>
>> For the single-partition case you'd not need to split the file of
>> course, so it would end up looking much like the current arrangement.
>
> We need to think about the "data model" of the storage layer. Space
> itself isn't the issue, its the assumptions that all of the other
> subsystems currently make about what how a table is structured, indexed,
> accessed and manipulated.
Which was why I was thinking you'd want to maintain indexes etc.
thinking in terms of a table being a contiguous set of blocks, with the
mapping to an actual on-disk block taking place below that level. (If
I've understood you).
> Currently: Table 1:M Segments
>
> Option 1: Table 1:M Segments and *separately* Table 1:M Partitions, so
> partitions are always have a maximum size. The size just changes the
> impact, doesn't change the impact of holes, max sizes etc.
> e.g. empty table with 10 partitions would be
> a) 0 bytes in 1 file
> b) 0 bytes in 1 file, plus 9GB in 9 files all full of empty blocks
Well, presumably 0GB in 10 files, but 10GB-worth of block-numbers
"pre-allocated".
> e.g. table with 10 partitions each of 1.5GB would be
> a) 15 GB in 15 files
With the limitation that any given partition might contain a mix of
data-ranges (e.g. 2005 lies half in partition 2 and half in partition 3).
> b) hit max size limit of partition: ERROR
In the case of 1b, you could have a segment mapping to more than 1
partition, avoiding the error. So 2004 data is in partition 1, 2005 is
in partitions 2,3 (where 3 is half empty), 2006 is in partition 4.
However, this does mean you've got a lot of wasted block numbers. If you
were using explicit (fixed) partitioning and chose a bad set of criteria
your maximum table size could be substantially reduced.
> Option 2: Table 1:M Child Tables 1:M Segments
> e.g. empty table with 10 partitions would be
> 0 bytes in each of 10 files
>
> e.g. table with 10 partitions each of 1.5GB would be
> 15GB in 10 groups of 2 files
Cross-table indexes and constraints would be useful outside of the
current scenario.
> Option 3: Table 1:M Nodes 1:M Segments
> e.g. empty table with 10 partitions would be
> 0 bytes in each of 10 files
>
> e.g. table with 10 partitions each of 1.5GB would be
> 15GB in 10 groups of 2 files
Ah, so this does seem to be roughly the same as I was rambling about.
This would presumably mean that rather than (table, block #) specifying
the location of a row you'd need (table, node #, block #).
> So 1b) seems definitely out.
>
> The implications of 2 and 3 are what I'm worried about, which is why the
> shortcomings of 1a) seem acceptable currently.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Meskes | 2008-01-11 15:16:12 | scan.l: check_escape_warning() |
Previous Message | Simon Riggs | 2008-01-11 12:40:21 | Re: Dynamic Partitioning using Segment Visibility Maps |