Re: Range types

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Scott Bailey <artacus(at)comcast(dot)net>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Range types
Date: 2009-12-16 17:31:17
Message-ID: 1260984677.15987.1415.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2009-12-13 at 23:49 -0800, Scott Bailey wrote:
> So basically I have an anyrange pseudo type with the functions prev,
> next, last, etc defined. So instead of hard coding range types, we would
> allow the user to define their own range types. Basically if we are able
> to determine the previous and next values of the base types we'd be able
> to define a range type. I'm envisioning in a manner much like defining
> an enum type.

After an off-list discussion with Scott, I think there may be a solution
here that works for everyone if we don't try so hard to unify the
implementation of discrete and continuous ranges. The API should be very
similar, of course, but the implementation doesn't need to be.

Continuous ranges absolutely require the following information: start,
end, and inclusivity information.

But discrete ranges can instead be stored by counting the number of
granules from the start point. For instance, it could be stored as:
start, num_granules.

That has a lot of benefits for discrete ranges of time. First of all, it
allows the algebra to work reasonably well for the "days" and "months"
part of the interval, so we can allow a granule of 1 day/week/month/year
for a timestamp range. For output of the range, we can then just
multiply the granule by the number of granules, and add that to the
start time; thus avoiding the "incremental addition" problem with date
math. I think this works reasonably well for timestamp/date ranges --
let me know if there is a problem here (aside from timestamptz, which I
address below).

Secondly, in the case of a timestamp range, we can use 7 bytes for
storing the number of granules rather than another full 8-byte
timestamp, leaving one byte for flags to represent NULL boundaries,
infinite boundaries, etc. For timestamps that would still mean that an
interval could be 2000 years long with '1 microsecond' granularity. For
dates, 3 bytes is sufficient for a date range 45000 years long with
granules of '1 day'. That means that we can get back down to a 16 byte
representation for timestamp ranges, or 8 byte representation for date
ranges. There are a few details, like infinite ranges, but those can be
pretty easily solved with flags as well.

There's one problem, and that's for timestamptz ranges with intervals
that include days and months. Timezone adjustments are just not
well-defined for that kind of granule (nor would it be particularly
useful even if it magically worked), so this would have to be blocked
somehow. I think that's a special case, and we could provide the user
with a nice error message telling the user to use a date or timestamp
range instead.

So, the idea is to default to a continuous range type, but if the user
supplies a granule, prior and next functions, and other necessary
details, then it becomes a discrete range type.

* continuous ranges can still have everything that everyone wants,
including flags to indicate special values.
* discrete range granule is specified explicitly, so it's not an
"implementation detail"
* discrete ranges can have a compact representation
* discrete ranges would still have room for flags to indicate special
values

Comments?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-16 17:42:06 Re: Range types
Previous Message Greg Smith 2009-12-16 17:28:23 Re: PostgreSQL project policy compendium