Re: Partitioning option for COPY

From: Emmanuel Cecchet <manu(at)asterdata(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioning option for COPY
Date: 2009-11-23 14:39:33
Message-ID: 4B0A9EA5.4020708@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon,

I think you should read the thread and the patch before making any false
statements like you did in your email.

1. The patch does not use any trigger for routing.
2. This is just an option for COPY that is useful for loading operations
in the datawarehouse world. It is not meant to implement full
partitioning as explained many times already in this thread.
3. This patch elaborates on existing mechanisms and cannot rely on a
meta-data representation of partitions which does not exist yet and will
probably not exist in 8.5

You should justify your statements when you say 'potentially buggy in
its approach to developing a cache and using trigger-like stuff'. I
understand that you don't like it because this is not what you want but
this is not my fault. This is not an implementation of partitioning like
COPY does not do update/delete/alter/...
And yes the use case is 'narrow' like any option in COPY. It is like
complaining that the CSV option is not useful because you want to load
binary dumps.

If Itagaki gets the support of the community to get his implementation
accepted, I will gladly use it. Contributing? If Aster is willing to
contribute a code monkey to implement your specs, why not but you will
have to convince them.

You should really think twice about the style of your emails that cast a
detestable tone to discussions on pg-hackers.

Emmanuel

> On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
>
>> Hi,
>>
>>>> I have extracted the partitioning option for COPY (removed the error
>>>> logging part) from the previous patch.
>>>>
>>>>
>>> We can use an INSERT trigger to route tuples into partitions even now.
>>> Why do you need an additional router for COPY?
>>>
>> Tom has already explained on the list why using a trigger was a bad idea
>> (and I know we can use a trigger since I am the one who wrote it).
>> If you look at the code you will see that you can do optimizations in
>> the COPY code that you cannot do in the trigger.
>>
>>
>>> Also, it would be nicer
>>> that the router can works not only in COPY but also in INSERT.
>>>
>>>
>> As 8.5 will at best provide a syntactic hack on top of the existing
>> constraint implementation, I think that it will not hurt to have routing
>> in COPY since we will not have it anywhere otherwise.
>>
>>> BTW, I'm working on meta data of partitioning now. Your "partitioning"
>>> option in COPY could be replaced with the catalog.
>>>
>>>
>> This implementation is only for the current 8.5 and it will not be
>> needed anymore once we get a fully functional partitioning in Postgres
>> which seems to be for a future version.
>>
>
> Yes, the trigger way of doing this is a bad way.
>
> I regret to say that the way proposed here isn't much better, AFAICS.
> Let me explain why I think that, but -1 to anyone applying this patch.
>
> This patch proposes keeping a cache of last visited partitions to reduce
> the overhead of data routing.
>
> What I've requested is that partitioning work by using a data structure
> held in relcache for inheritance parents. This differs in 3 ways from
> this patch
> a) it has a clearly defined location for the cached metadata, with
> clearly identified and well worked out mechanisms for cache invalidation
> b) the cache can be built once when it is first needed, not slowly grown
> as parts of the metadata are used
> c) it would be available for all parts of the server, not just COPY.
>
> The easiest way to build that metadata is when structured partitioning
> info is available. i.e. the best next action is to complete and commit
> Itagaki's partitioning syntax patch. Then we can easily build the
> metadata for partitioning, which can then be used in COPY for data
> routing.
>
> Anyway, I want data routing, as is the intention of this patch. I just
> don't think this patch is a useful way to do it. It is too narrow in its
> scope and potentially buggy in its approach to developing a cache and
> using trigger-like stuff.
>
> ISTM that with the right metadata in the right place, a cleaner and
> easier solution is still possible for 8.5. The code within COPY should
> really just reduce to a small piece of code to derive the correct
> relation for the desired row and then use that during heap_insert().
>
> I have just discussed partitioning with Itagaki-san at JPUG, so I know
> his plans. Itagaki-san and Manu, please can you work together to make
> this work for 8.5?
>
> ---
>
> A more detailed explanation of Partitioning Metadata:
>
> Partitioning Metadata is information held on the relcache for a table
> that has child partitions. Currently, a table does not cache info about
> its children, which prevents various optimisations.
>
> We would have an extra pointer on the Relation struct that points to a
> PartitioningMetadata struct. We can fill in this information when we
> construct the relcache for a relation, or we can populate it on demand
> the first time we attempt to use that information (if it exists).
>
> We want to hold an array of partition boundary values. This will then
> allow us to use bsearch to find the partition that a specific value
> applies to. Thus it can be used for routing data from INSERTs or COPY,
> can be used for identifying which partitions need to be
> included/excluded from an APPEND node. Using this will be O(logN) rather
> than O(N), so allowing us to have much larger number of partitions when
> required. Note that it can also be used within the executor to perform
> dynamic partition elimination, thus allowing us to easily implement
> partition aware joins etc.
>
> To construct the array we must sort the partition boundary values and
> prove that the partition definitions do not overlap. That is much easier
> to do when the partitions are explicitly defined. (Plus, there is no
> requirement to have, or mechanism to specify, unique partitions
> currently, although most users assume this in their usage).
>
> I imagine we would have an API called something like
> RelationIdentifyPartition() where we provide value(s) for the
> PartitioningKey column(s) and we then return the Oid of the partition
> that holds that value. That function would build the metadata, if not
> already cached, then bsearch it to provide the Oid.
>
>

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-11-23 14:59:18 Re: Partitioning option for COPY
Previous Message Tom Lane 2009-11-23 14:38:34 Re: [HACKERS] Updating column on row update