From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Declarative partitioning - another take |
Date: | 2016-11-10 12:40:09 |
Message-ID: | c7972ce6-50b8-2a13-c1e5-658a32b549ca@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks again for the prompt review!
On 2016/11/10 2:00, Robert Haas wrote:
> On Wed, Nov 9, 2016 at 6:14 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> But none of that is really a problem. Sure, the 'P' case will never
>>> arise in 9.6, but so what? I'd really like to not keep duplicating
>>> these increasingly-complex hunks of code if we can find some way to
>>> avoid that.
>>
>> We cannot reference pg_catalog.pg_get_partkeydef() in the SQL query that
>> getTables() sends to pre-10 servers, right? But I suppose we need not
>> call it in that particular SQL query in the first place.
>
> Oh, yeah, that's a problem; the query will error out against older
> servers. You could do something like:
>
> char *partkeydef;
> if (version <= 90600)
> partkeydef = "NULL";
> else
> partkeydef = "CASE WHEN c.relkind = 'P' THEN
> pg_catalog.pg_get_partkeydef(c.oid) ELSE NULL END";
>
> ...and the use %s to interpolate that into the query string.
Yeah, that's a way.
>>> I think that in most places were are referring to the "partitioning
>>> method" (with ing) but the "partition key" (without ing). Let's try to
>>> be consistent.
>>
>> I'm inclined to switch to "partitioning method" and "partitioning key",
>> but do you mean just the documentation or throughout? Beside
>> documentation, I mean source code comments, error messages, etc. I have
>> assumed throughout.
>
> I think "partitioning key" is a bit awkward and actually prefer
> "partiton key". But "partition method" sounds funny so I would go
> with "partitioning method".
OK, "partition key" and "partitioning method" it is then. Source code
comments, error messages, variables call the latter (partitioning)
"strategy" though which hopefully is fine.
>>> I don't think it's OK to allow the partition to be added if it
>>> contains rows that might not be valid. We are generally vary wary
>>> about adding options that can cause data integrity failures and I
>>> think we shouldn't start here, either. On the other hand, it's also
>>> not desirable for adding a partition to take O(n) time in all cases.
>>> So what would be nice is if the new partition could self-certify that
>>> contains no problematic rows by having a constraint that matches the
>>> new partitioning constraint. Then we can skip the scan if that
>>> constraint is already present.
>>
>> I agree that NO VALIDATE is undesirable and it would be better if we could
>> get rid of the same. I want to clarify one thing though: what does it
>> mean for the new partition to have a constraint that *matches* the new
>> partitioning constraint? Does it mean the new partition's constraint
>> *implies* the partitioning constraint? Or as certified by equal()?
>
> Implies would be better, but equal() might be tolerable.
I think a equal() -based method would fail to help in most cases. Consider
the following example:
create table foo1 (a int);
alter table foo add constraint check_a check (a < 10 and a >= 1);
create table foo (a int) partition by range (a);
alter table foo attach partition foo1 for values from (1) to (10);
The last command will internally generate the partition constraints that
is basically a list of implicitly AND'd expressions viz. a is not null, a
>= 1 and a < 10 which we wrap into a BoolExpr. It would not be equal()
with foo1's constraint even though they are basically the same constraint.
So, simple structural equality may prove to be less productive, IMHO.
It seems a *implies* -based solution would work much better, although a
bit slower for obvious reasons. I reckon slownsess is not a big issue in
this case. So I prototyped the same using predicate_implied_by() which
seems to work reasonably. Looks something like this:
skip_validate = false;
if (predicate_implied_by(partConstraint, existConstraint))
skip_validate = true;
Where partConstraint is 1-member list with the new partition constraint
and existConstraint is a list of the existing constraints of the table
being attached, derived from its TupleConstr.
>>> + /*
>>> + * Never put a
>>> null into the values array, flag
>>> + * instead for
>>> the code further down below where
>>> + * we
>>> construct the actual relcache struct.
>>> + */
>>> +
>>> found_null_partition = true;
>>> +
>>> null_partition_index = i;
>>>
>>> How about we elog(ERROR, ...) if found_null_partition is already set?
>>
>> Makes sense. However, let me mention here that duplicates either within
>> one partition's list or across the partitions are not possible. That's
>> because in case of the former, we de-duplicate before storing the list
>> into the catalog and the latter would simply be an overlap error. Could
>> this be made an Assert() then?
>
> Well, I think that wouldn't be as good, because for example some
> misguided person might do a manual update of the catalog. It seems to
> me that if the system catalog contents are questionable, it's better
> to error out than to have completely undefined behavior. In other
> words, we probably want it checked in non-Assert builds. See similar
> cases where we have things like:
>
> elog(ERROR, "conpfeqop is not a 1-D Oid array");
Good point, agreed.
>>> + /*
>>> + * Is lower_val = upper_val?
>>> + */
>>> + if (lower_val && upper_val)
>>>
>>> So, that comment does not actually match that code. That if-test is
>>> just checking whether both bounds are finite. What I think you should
>>> be explaining here is that if lower_val and upper_val are both
>>> non-infinite, and if the happen to be equal, we want to emit an
>>> equality test rather than a >= test plus a <= test because $REASON.
>>
>> OK, I have added some explanatory comments around this code.
>
> So, I was kind of assuming that the answer to "why are we doing this"
> was "for efficiency". It's true that val >= const AND const <= val
> can be simplified to val = const, but it wouldn't be necessary to do
> so for correctness. It just looks nicer and runs faster.
I guess I ended up with this code following along a bit different way of
thinking about it, but in the end what you're saying is true.
> By the way, if you want to keep the inclusive stuff, I think you
> probably need to consider this logic in light of the possibility that
> one bound might be exclusive. Maybe you're thinking that can't happen
> anyway because it would have been rejected earlier, but an elog(ERROR,
> ...) might still be appropriate just to guard against messed-up
> catalogs.
Yes, that makes sense. I guess you mean a case like the one shown below:
create table foo5 partition of foo for values start (3, 10) end (3, 10);
ERROR: cannot create range partition with empty range
I think the following would suffice as a guard (checked only if it turns
out that lower_val and upper_val are indeed equal):
if (i == key->partnatts - 1 && spec->lowerinc != spec->upperinc)
elog(ERROR, "invalid range bound specification");
>>> What happens if you try to attach a table as a partition of itself or
>>> one of its ancestors?
>>
>> The latter fails with "already a partition" error.
>>
>> The former case was not being handled at all which has now been fixed.
>> ATExecAddInherit() prevents that case as an instance of preventing the
>> circularity of inheritance. It says: ERROR: circular inheritance not
>> allowed. And then: DETAIL: "rel" is already a child of "rel".
>>
>> Should ATExecAttachPartition() use the same trick and keep using the same
>> message(s)? The above detail message doesn't quite sound appropriate when
>> one tries to attach a table as partition of itself.
>
> Whichever way is easier to code seems fine. It's a pretty obscure
> case, so I don't think we need to add code just to get a very slightly
> better error message.
OK, so let's keep it same as with regular inheritance.
>>> + | FOR VALUES START '(' range_datum_list ')' lb_inc
>>> + END_P '('
>>> range_datum_list ')' ub_inc
>>>
>>> Just a random idea. Would for VALUES FROM ( ... ) TO ( ... ) be more
>>> idiomatic than START and END?
>>
>> It would actually. Should I go ahead and change to FROM (...) TO (...)?
>
> +1!
Done!
>> Related to range partitioning, should we finalize on inclusive START/FROM
>> and exclusive END/TO preventing explicit specification of the inclusivity?
>
> I would be in favor of committing the initial patch set without that,
> and then considering the possibility of adding it later. If we
> include it in the initial patch set we are stuck with it.
OK, I have removed the syntactic ability to specify INCLUSIVE/EXCLUSIVE
with each of the range bounds.
I haven't changed any code (such as comparison functions) that manipulates
instances of PartitionRangeBound which has a flag called inclusive. I
didn't remove the flag, but is instead just set to (is_lower ? true :
false) when initializing from the parse node. Perhaps, there is some scope
for further simplifying that code, which you probably alluded to when you
proposed that we do this.
>> Attached updated patches take care of the above comments and few other
>> fixes. There are still a few items I have not addressed right away:
>>
>> - Remove NO VALIDATE clause in ATTACH PARTITION and instead rely on the
>> new partition's constraints to skip the validation scan
>> - Remove the syntax to specify inclusivity of each of START and END bounds
>> of range partitions and instead assume inclusive START and exclusive END
>
> OK - what is the time frame for these changes?
I have implemented both of these in the attached patch. As mentioned
above, the logic to skip the validation scan using the new partition's
constraints is still kind of a prototype solution, but it seems to work so
far. Comments on the same would be very helpful.
Thanks,
Amit
Attachment | Content-Type | Size |
---|---|---|
0001-Catalog-and-DDL-for-partitioned-tables-12.patch | text/x-diff | 114.3 KB |
0002-psql-and-pg_dump-support-for-partitioned-tables-12.patch | text/x-diff | 23.9 KB |
0003-Catalog-and-DDL-for-partitions-12.patch | text/x-diff | 205.4 KB |
0004-psql-and-pg_dump-support-for-partitions-12.patch | text/x-diff | 22.1 KB |
0005-Teach-a-few-places-to-use-partition-check-quals-12.patch | text/x-diff | 30.6 KB |
0006-Introduce-a-PartitionTreeNode-data-structure-12.patch | text/x-diff | 8.0 KB |
0007-Tuple-routing-for-partitioned-tables-12.patch | text/x-diff | 43.7 KB |
0008-Update-DDL-Partitioning-chapter-to-reflect-new-devel-12.patch | text/x-diff | 24.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Emre Hasegeli | 2016-11-10 12:41:19 | Re: Floating point comparison inconsistencies of the geometric types |
Previous Message | Dean Rasheed | 2016-11-10 12:17:50 | Re: Improving RLS planning |