Quick Links

Re: Auto Partitioning

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 18:55:26
Message-ID:	4613F49E.1030901@bluegap.ch
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Hi,

Gregory Stark wrote:
> However there are also cases such as where you have a=0..99 in one partition
> and a=100..199 in partition two, etc. It could still automatically build
> indexes on (a,b,c) on each partition and somehow note that the unique
> constraint is guaranteed across the whole partitioned table.

Uhm... yes, because 'a' is the partitioning key.

According to my outline for partitioning rule sets, you would have a
split @ a <= 100. Probably another one @ a <= 200, etc... but none the
less, 'a' is the only column needed to decide what partition a row has
to end up in, so 'a' is the only column in the partitioning key.

What I'm saying is, that given your example, it's not easily possible to
have an index on (b,a) even if 'a' is also in the partitioning key. It's
very well possible to emulate a multi-table index on (a,b), though.

Brainstorming about this somewhat more: how about having multiple
columns in the partitioning key, i.e. 'a' and 'b', and the following
rule set (which admittedly is somewhat special):

table sample
|
|
split @ a >= 100
/ \
/ \
split @ b >= 100 part3
/ \
/ \
part1 part2

An index on (a, b) could easily be 'emulated' by having such an index on
all the partitions, but can we have an index on (b, a) like that?
Probably not, because at the first split, we would have to duplicate.
I.e. for an index scan on 'b = 22', we would have to scan the index on
part3 as well as part1.

Thus one can say, that an multi-table index can only easily be
'emulated', if it has the same columns as the partitioning key, in the
same order. For the above example, these ones would be possible:

(a)
(a,b)
(a,b,...)

Yet another thought: the emulation of multi-table indexes, in this case,
is like concatenating the indexes of the partitions in the right order.
Asking for an index scan for 'WHERE a >= 95 AND a <= 105' when having a
split at a >= 100, you would have to start on the index in the left
bucket (with a < 100) and return everything until the end of the index,
then continue on the index in the right bucket (with a >= 100). So you
also have to be able to determine an order, which is easily possible for
splits, but not so simple for modulos (hash partitioning).

For such a modulo node, the executor would have to start multiple index
scans, i.e.:

When scanning for a range (i.e. 'WHERE id >= 5 AND id <= 17'), the
planner would have to request an index scan on each of the partition,
joining the results in the right order.

So, why not completely emulate all multi-table index scans? The above
restriction would disappear, if we could teach the planner and executor
how to join multiple index scan results, no?

Questioning the other way around: do we need any sort of multi-table
indexes at all, or isn't it enough to teach the planner and executor how
to intelligently scan through (possibly) multiple indexes to get what is
requested?

Regards

Markus

In response to

Re: Auto Partitioning at 2007-04-04 18:06:50 from Gregory Stark

Responses

Re: Auto Partitioning at 2007-04-04 19:01:45 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zoltan Boszormenyi	2007-04-04 18:57:02	Re: IDENTITY/GENERATED v36 Re: Final version of IDENTITY/GENERATED patch
Previous Message	Simon Riggs	2007-04-04 18:52:50	Re: Synchronized Scan benchmark results

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Zoltan Boszormenyi	2007-04-04 18:57:02	Re: IDENTITY/GENERATED v36 Re: Final version of IDENTITY/GENERATED patch
Previous Message	FAST PostgreSQL	2007-04-04 18:36:52	Updateable cursors patch