Re: Adding support for Default partition in partitioning

From: Keith Fiske <keith(at)omniti(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Adding support for Default partition in partitioning
Date: 2017-04-06 14:46:12
Message-ID: CAG1_KcAS6ernbvQC65XOCDjmtvb+aacvDs-o-HjGYGrstuYzbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 6, 2017 at 1:18 AM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
wrote:

> On 2017/04/06 13:08, Keith Fiske wrote:
> > On Wed, Apr 5, 2017 at 2:51 PM, Keith Fiske wrote:
> >> Only issue I see with this, and I'm not sure if it is an issue, is what
> >> happens to that default constraint clause when 1000s of partitions start
> >> getting added? From what I gather the default's constraint is built
> based
> >> off the cumulative opposite of all other child constraints. I don't
> >> understand the code well enough to see what it's actually doing, but if
> >> there are no gaps, is the method used smart enough to aggregate all the
> >> child constraints to make a simpler constraint that is simply outside
> the
> >> current min/max boundaries? If so, for serial/time range partitioning
> this
> >> should typically work out fine since there are rarely gaps. This
> actually
> >> seems more of an issue for list partitioning where each child is a
> distinct
> >> value or range of values that are completely arbitrary. Won't that check
> >> and re-evaluation of the default's constraint just get worse and worse
> as
> >> more children are added? Is there really even a need for the default to
> >> have an opposite constraint like this? Not sure on how the planner works
> >> with partitioning now, but wouldn't it be better to first check all
> >> non-default children for a match the same as it does now without a
> default
> >> and, failing that, then route to the default if one is declared? The
> >> default should accept any data then so I don't see the need for the
> >> constraint unless it's required for the current implementation. If
> that's
> >> the case, could that be changed?
>
> Unless I misread your last sentence, I think there might be some
> confusion. Currently, the partition constraint (think of these as you
> would of user-defined check constraints) is needed for two reasons: 1. to
> prevent direct insertion of rows into the default partition for which a
> non-default partition exists; no two partitions should ever have duplicate
> rows. 2. so that planner can use the constraint to determine if the
> default partition needs to be scanned for a query using constraint
> exclusion; no need, for example, to scan the default partition if the
> query requests only key=3 rows and a partition for the same exists (no
> other partition should have key=3 rows by definition, not even the
> default). As things stand today, planner needs to look at every partition
> individually for using constraint exclusion to possibly exclude it, *even*
> with declarative partitioning and that would include the default partition.
>

Forgot about constraint exclusion. My follow up email that you answered
below was addressing the prevention of data to the default if there was no
constraint on the default. I guess my main concern was with how manageable
that cumulative opposite constraint of the default would be over time,
especially with list partitioning. And also that it's smart enough to
consolidate constraint conditions to simplify things if it's found that two
or more conditions cover a continuous range.

>
> > Actually, thinking on this more, I realized this does again come back to
> > the lack of a global index. Without the constraint, data could be put
> > directly into the default that could technically conflict with the
> > partition scheme elsewhere. Perhaps, instead of the constraint, inserts
> > directly to the default could be prevented on the user level. Writing to
> > valid children directly certainly has its place, but been thinking about
> > it, and I can't see any reason why one would ever want to write directly
> to
> > the default. It's use case seems to be around being a sort of temporary
> > storage until that data can be moved to a valid location. Would still
> need
> > to allow removal of data, though.
>
> As mentioned above, the default partition will not allow directly
> inserting a row whose key maps to some existing (non-default) partition.
>
> As far as tuple-routing is concerned, it will choose the default partition
> only if no other partition is found for the key. Tuple-routing doesn't
> use the partition constraints directly per se, like one of the two things
> mentioned above do. One could say that tuple-routing assigns the incoming
> rows to partitions such that their individual partition constraints are
> not violated.
>
>
Finally, we don't yet offer global guarantees for constraints like unique.
> The only guarantee that's in place is that no two partitions can contain
> the same partition key.
>
> Thanks,
> Amit
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-06 14:55:53 Re: [HACKERS] [GSoC] Push-based query executor discussion
Previous Message Masahiko Sawada 2017-04-06 14:44:51 Re: Interval for launching the table sync worker