Re: SQL:2011 application time

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL:2011 application time
Date: 2024-07-09 07:15:39
Message-ID: CACJufxHPQaD+q6o_AMg1PG6ETOWC=VqzN_=6QV9C5zbR8j=Tww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 6, 2024 at 4:56 AM Paul Jungwirth
<pj(at)illuminatedcomputing(dot)com> wrote:
>
> On 5/21/24 11:27, Isaac Morland wrote:
> > On Tue, 21 May 2024 at 13:57, Robert Haas <robertmhaas(at)gmail(dot)com <mailto:robertmhaas(at)gmail(dot)com>> wrote:
> >
> > What I think is less clear is what that means for temporal primary
> > keys. As Paul pointed out upthread, in every other case, a temporal
> > primary key is at least as unique as a regular primary key, but in
> > this case, it isn't. And someone might reasonably think that a
> > temporal primary key should exclude empty ranges just as all primary
> > keys exclude nulls. Or they might think the opposite.
> >
> >
> > Fascinating. I think you're absolutely right that it's clear that two empty intervals don't
> > conflict. If somebody wants to claim two intervals conflict, they need to point to at least one
> > instant in time that is common between them.
> >
> > But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items
> > are identified by a time range, non-overlapping or not, then the empty range can only identify one
> > item (per value of whatever other columns are in the primary key). I think for a unique key the
> > non-overlapping restriction has to be considered an additional restriction on top of the usual
> > uniqueness restriction.
> >
> > I suspect in many applications there will be a non-empty constraint; for example, it seems quite
> > reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed
> > they should behave in the mathematically appropriate way.
>
> Finding a way forward for temporal PKs got a lot of discussion at pgconf.dev (thanks especially to
> Peter Eisentraut and Jeff Davis!), so I wanted to summarize some options and describe what I think
> is the best approach.
>
> First the problem: empty ranges! A temporal PK/UNIQUE constraint is basically an exclusion
> constraint that is `(id WITH =, valid_at WITH &&)`. But the special 'empty' value never overlaps
> anything, *including itself*. (Note it has no "position": [3,3) is the same as [4,4).) Since the
> exclusion constraint forbids overlapping ranges, and empties never overlap, your table can have
> duplicates. (I'm talking about "literal uniqueness" as discussed in [1].) For instance:
>
> CREATE EXTENSION btree_gist;
> CREATE TABLE t (id int, valid_at daterange, name text);
> ALTER TABLE t ADD CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
> INSERT INTO t VALUES (1, 'empty', 'foo');
> INSERT INTO t VALUES (1, 'empty', 'bar');
>
> Multiranges have the same problem. So what do we do about that?
>
> **Option 0**: Allow it but document it. It shouldn't happen in practice: there is no reason for an
> empty range to get into a temporal table, and it arguably doesn't mean anything. The record is true
> at no time? But of course it will happen anyway. It's a footgun and will break expectations for at
> least some.
>
> It causes problems for us too. If you say `SELECT name FROM t GROUP BY id, valid_at`, we recognize
> that `name` is a functional dependency on the PK, so we allow it and give you the first row matching
> each key. You might get "foo" or you might get "bar". Also the planner uses not-nullable uniqueness
> to take many shortcuts. I couldn't create any concrete breakage there, but I bet someone else could.
> PKs that are not literally unique seems like something that would cause headaches for years.
>
> **Option 1**: Temporal PKs should automatically create a CHECK constraint that forbids empty ranges.
> Should UNIQUE constraints too? I'm tempted to say no, since sometimes users surprise us by coming up
> with new ways to use things. For instance one way to use empty ranges is to reference a temporal
> table from a non-temporal table, since `'empty' <@ anything` is always true (though this has
> questionable meaning or practical use). But probably we should forbid empties for UNIQUE constraints
> too. Forbidding them is more aligned with the SQL standard, which says that when you have a PERIOD,
> startcol < endcol (not <=). And it feels more consistent to treat both constraints the same way.
> Finally, if UNIQUEs do allow empties, we still risk confusing our planner.
>
> My last patch created these CHECK constraints for PKs (but not UNIQUEs) as INTERNAL dependencies.
> It's pretty clunky. There are lots of cases to handle, e.g. `ALTER COLUMN c TYPE` may reuse the PK
> index or may generate a new one. And what if the user already created the same constraint? Seeing
> all the trouble giving PKs automatic (cataloged) NOT NULL constraints makes me wary about this
> approach. It's not as bad, since there is no legacy, but it's still more annoying than I expected.
>
> Finally, hanging the CHECK constraint off the PK sets us up for problems when we add true PERIODs.
> Under 11.27 of SQL/Foundation, General Rules 2b says that defining a PERIOD should automatically add
> a CHECK constraint that startcol < endcol. That is already part of my last patch in this series. But
> that would be redundant with the constraint from the PK. And attaching the constraint to the PERIOD
> is a lot simpler than attaching it to the PK.
>
> **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does*
> overlap another empty range. Empty ranges should still not overlap anything else. This would fix the
> exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to
> people who want to use them. (We would still forbid them if you define a PERIOD, because those come
> with the CHECK constraint mentioned above.)
> And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps
> something small (something with zero width) but not something big. Surely if a && b and b <@ c, then
> a && c? So this feels like the kind of elegant hack that you eventually regret.
>
> **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the
> executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above.
> Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users
> don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their
> own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for
> any table with constraints where conperiod is true. We'd also need to do this check on existing rows
> when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have
> relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want
> relperiods.
>
> **Option 4**: Teach GiST indexes to enforce uniqueness. We didn't discuss this at pgconf, at least
> not in reference to the empties problem. But I was thinking about this request from Matthias for
> temporal PKs & UQs to support `USING INDEX idx`.[2] It is confusing that a temporal index has
> indisunique, but if you try to create a unique GiST index directly we say they don't support UNIQUE
> indexes! Similarly `pg_indexam_has_property(783, 'can_unique')` returns false. There is something
> muddled about all that. So how about we give the GiST AM handler amcanunique?
>
I think we can Forbid empties,not not mess with pg_class.

to make the communication smooth, i've set the base commit to
46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
{Add temporal PRIMARY KEY and UNIQUE constraints}
https://git.postgresql.org/cgit/postgresql.git/commit/?id=46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
you can git reset --hard 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
then apply the attached patch.

I hope I understand it correctly.
previously revert is only because the special value: empty.
i tried to use the operator &&&, new gist strategy number, pg_amop
entry to solve the problem.
Now with the applied patch, if the range column is specified WITHOUT OVERLAPS,
then this column is not allowed to have any empty range value.

logic work through:
* duplicate logic of range_overlaps but disallow empty value. also
have the operator &&&, (almost equivalent to &&)
* add new gist strategy number
* thanks to add stratnum GiST support function
(https://git.postgresql.org/cgit/postgresql.git/commit/?id=6db4598fcb82a87a683c4572707e522504830a2b)
now we can set the strategy number to the mapped new function
(equivalent to range_overlaps, but error out empty value)
* in ComputeIndexAttrs, set the strategy number to the newly created
StrategyNumber in "else if (iswithoutoverlaps)" block.
* Similarly refactor src/backend/utils/adt/rangetypes_gist.c make the
index value validation using newly created function.

function name, error message maybe not great now, but it works.
------full demo, also see the comments.
DROP TABLE if exists temporal_rng;
CREATE TABLE temporal_rng (id int4range, valid_at tsrange);
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
--should be fine.
INSERT INTO temporal_rng VALUES ('empty', '[2022-01-01,2022-01-02]');
--will error out, period column, empty range not allowed
INSERT INTO temporal_rng VALUES ('[3,3]', 'empty');

ALTER TABLE temporal_rng DROP CONSTRAINT temporal_rng_pk;
--period constraint dropped, now should be fine.
INSERT INTO temporal_rng VALUES ('[3,3]', 'empty');

--reinstall constraint, should error out
--because existing one row has empty value.
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
delete from temporal_rng where id = '[3,3]';

--reinstall constraint, should be fine, because empty value removed.
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);

Attachment Content-Type Size
v1-0001-forbid-empty-range-value-for-period-column.patch text/x-patch 16.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-07-09 07:19:35 Re: Doc Rework: Section 9.16.13 SQL/JSON Query Functions
Previous Message ikedarintarof 2024-07-09 06:56:34 Re: doc: modify the comment in function libpqrcv_check_conninfo()