Re: cataloguing NOT NULL constraints

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, exclusion(at)gmail(dot)com, dean(dot)a(dot)rasheed(at)gmail(dot)com, andrewbille(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: cataloguing NOT NULL constraints
Date: 2024-05-13 15:14:42
Message-ID: CA+TgmoZg9_h=CQUJfKuf3XCiTRY+p4abJCeW4vMgTtbo-UtkcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 13, 2024 at 9:44 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> The problematic point is the need to add NOT NULL constraints during
> table creation that don't exist in the table being dumped, for
> performance of primary key creation -- I called this a throwaway
> constraint. We needed to be able to drop those constraints after the PK
> was created. These were marked NO INHERIT to allow them to be dropped,
> which is easier if the children don't have them. This all worked fine.

This seems really weird to me. Why is it necessary? I mean, in
existing releases, if you declare a column as PRIMARY KEY, the columns
included in the key are forced to be NOT NULL, and you can't change
that for so long as they are included in the PRIMARY KEY. So I would
have thought that after this patch, you'd end up with the same thing.
One way of doing that would be to make the PRIMARY KEY depend on the
now-catalogued NOT NULL constraints, and the other way would be to
keep it as an ad-hoc prohibition, same as now. In PostgreSQL 16, I get
a dump like this:

CREATE TABLE public.foo (
a integer NOT NULL,
b text
);

COPY public.foo (a, b) FROM stdin;
\.

ALTER TABLE ONLY public.foo
ADD CONSTRAINT foo_pkey PRIMARY KEY (a);

If I'm dumping from an existing release, I don't see why any of that
needs to change. The NOT NULL decoration should lead to a
system-generated constraint name. If I'm dumping from a new release,
the NOT NULL decoration needs to be replaced with CONSTRAINT
existing_constraint_name NOT NULL. But I don't see why I need to end
up with what the patch generates, which seems to be something like
CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT. That kind
of thing suggests that we're changing around the order of operations
in pg_dump, probably by adding the NOT NULL constraints at a later
stage than currently, and I think the proper solution is most likely
to be to avoid doing that in the first place.

> However, at some point we realized that we needed to add NOT NULL
> constraints in child tables for the columns in which the parent had a
> primary key. Then things become messy because we had the throwaway
> constraints on one hand and the not-nulls that descend from the PK on
> the other hand, where one was NO INHERIT and the other wasn't; worse if
> the child also has a primary key.

This seems like another problem that is created by changing the order
of operations in pg_dump.

> > The other possibility that occurs to me is that I think the motivation
> > for cataloging NOT NULL constraints was that we wanted to be able to
> > track dependencies on them, or something like that, which seems like
> > it might be able to create issues of the type that you're facing, but
> > the details aren't clear to me.
>
> NOT VALID constraints would be extremely useful, for one thing (because
> then you don't need to exclusively-lock the table during a long scan in
> order to add a constraint), and it's just one step away from having
> these constraints be catalogued. It was also fixing some inconsistent
> handling of inheritance cases.

I agree that NOT VALID constraints would be very useful. I'm a little
scared by the idea of fixing inconsistent handling of inheritance
cases, just for fear that there may be more things relying on the
inconsistent behavior than we realize. I feel like this is an area
where it's easy for changes to be scarier than they at first seem. I
still have memories of discovering some of the current behavior back
in the mid-2000s when I was learning PostgreSQL (and databases
generally). It struck me as fiddly back then, and it still does. I
feel like there are probably some behaviors that look like arbitrary
decisions but are actually very important for some undocumented
reason. That's not to say that we shouldn't try to make improvements,
just that it may be hard to get right.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Frédéric Yhuel 2024-05-13 15:14:53 Re: New GUC autovacuum_max_threshold ?
Previous Message Heikki Linnakangas 2024-05-13 14:54:30 Re: Direct SSL connection with ALPN and HBA rules