From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Declarative partitioning - another take |
Date: | 2016-11-18 10:59:08 |
Message-ID: | 2587c47e-7ff7-80db-953f-fb532c215219@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2016/11/18 4:14, Robert Haas wrote:
> On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Meanwhile, here are updated patch that address most of the following comments.
>
> OK, I have re-reviewed 0005 and it looks basically fine to me, modulo
> a few minor nitpicks. "This is called, *iff*" shouldn't have a comma
> there,
Fixed.
> and I think the entire comment block that starts with "NOTE:
> SQL specifies that a NULL" and ends with "it's unlikely that NULL
> would result." should be changed to say something like /* As for
> catalogued constraints, we treat a NULL result as a success, not a
> failure. */ rather than duplicating an existing comment that doesn't
> quite apply here.
Ah, you're right that the comment does not apply as it is. I rewrote that
comment.
Oh but wait, that means I can insert rows with NULLs in the range
partition key if I choose to insert it directly into the partition,
whereas I have been thinking all this while that there could never be
NULLs in the partition key of a range partition. What's more,
get_qual_for_partbound() (patch 0003) emits a IS NOT NULL constraint for
every partition key column in case of a range partition. Is that
wrongheaded altogether? (also see my reply to your earlier message about
NULLs in the range partition key)
> Finally, ExecConstraints() contains a new if-block
> whose sole contents are another if-block. Perhaps if (this && that)
> would be better.
Agreed, should have noticed that.
> Regarding 0006 and 0007, I think the PartitionTreeNodeData structure
> you've chosen is awfully convoluted and probably not that efficient.
> For example, get_partition_for_tuple() contains this loop:
>
> + prev = parent;
> + node = parent->downlink;
> + while (node != NULL)
> + {
> + if (node->index >= cur_idx)
> + break;
> +
> + prev = node;
> + node = node->next;
> + }
>
> Well, it looks to me like that's an O(n) way to find the n'th
> partition, which seems like a pretty bad idea in performance-critical
> code, which this is. I think this whole structure needs a pretty
> heavy overhaul. Here's a proposal:
Thanks for the idea below!
> 1. Forget the idea of a tree. Instead, let the total number of tables
> in the partitioning hierarchy be N and let the number of those that
> are partitioned be K. Assign each partitioned table in the hierarchy
> an index between 0 and K-1. Make your top level data structure (in
> lieu of PartitionTreeNodeData) be an array of K PartitionDispatch
> objects, with the partitioning root in entry 0 and the rest in the
> remaining entries.
>
> 2. Within each PartitionDispatch object, store (a) a pointer to a
> PartitionDesc and (b) an array of integers of length equal to the
> PartitionDesc's nparts value. Each integer i, if non-negative, is the
> final return value for get_partition_for_tuple. If i == -1, tuple
> routing fails. If i < -1, we must next route using the subpartition
> whose PartitionDesc is at index -(i+1). Arrange for the array to be
> in the same order the PartitionDesc's OID list.
>
> 3. Now get_partition_for_tuple looks something like this:
>
> K = 0
> loop:
> pd = PartitionDispatch[K]
> idx = list/range_partition_for_tuple(pd->partdesc, ...)
> if (idx >= -1)
> return idx
> K = -(idx + 1)
>
> No recursion, minimal pointer chasing, no linked lists. The whole
> thing is basically trivial aside from the cost of
> list/range_partition_for_tuple itself; optimizing that is a different
> project. I might have some details slightly off here, but hopefully
> you can see what I'm going for: you want to keep the computation that
> happens in get_partition_for_tuple() to an absolute minimum, and
> instead set things up in advance so that getting the partition for a
> tuple is FAST. And you want the data structures that you are using in
> that process to be very compact, hence arrays instead of linked lists.
This sounds *much* better. Here is a quick attempt at coding the design
you have outlined above in the attached latest set of patches.
PS: I haven't run the patches through pgindent yet.
Thanks,
Amit
Attachment | Content-Type | Size |
---|---|---|
0001-Catalog-and-DDL-for-partitioned-tables-16.patch | text/x-diff | 115.3 KB |
0002-psql-and-pg_dump-support-for-partitioned-tables-16.patch | text/x-diff | 23.9 KB |
0003-Catalog-and-DDL-for-partitions-16.patch | text/x-diff | 208.2 KB |
0004-psql-and-pg_dump-support-for-partitions-16.patch | text/x-diff | 22.1 KB |
0005-Teach-a-few-places-to-use-partition-check-quals-16.patch | text/x-diff | 30.8 KB |
0006-Introduce-a-PartitionDispatch-data-structure-16.patch | text/x-diff | 6.6 KB |
0007-Tuple-routing-for-partitioned-tables-16.patch | text/x-diff | 33.9 KB |
0008-Update-DDL-Partitioning-chapter-to-reflect-new-devel-16.patch | text/x-diff | 24.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2016-11-18 11:40:18 | Re: Push down more full joins in postgres_fdw |
Previous Message | Emre Hasegeli | 2016-11-18 09:58:27 | Re: Floating point comparison inconsistencies of the geometric types |