Re: Validating check constraints without a table scan?

From: Torsten Förtsch <tfoertsch123(at)gmail(dot)com>
To: Philip Couling <couling(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Validating check constraints without a table scan?
Date: 2024-11-15 15:28:06
Message-ID: CAKkG4_=pL3s0cpr4d4hWcobRcdBF0=73RMcasiSz8vM9dcubxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Nov 15, 2024 at 9:38 AM Philip Couling <couling(at)gmail(dot)com> wrote:

> Is there a solid reason why adding a check constraint does not use
> existing indexes for validation.
>
> We are currently looking at partitioning a multi TB table leaving all
> existing data in place and simply attaching it as a partition to a new
> table. To prevent locking, we are trying to add an INVALID check constraint
> first and then validate it.
>
> I can trivially prove the invalid constraint is valid with a simple SELECT
> which will use an existing index and return instantaneously. But AFAIK
> Theres no way to mark a constraint as valid without scanning all the rows.
>

Most likely your query is not exactly the same as the check constraint.
Think about NULL and similar.

> This operation is really problematic on a production database with heavy
> IO load.
>
> Is there a solid ready why validating check constraints cannot use
> existing indexes? If I can prove the constraint is valid so trivially with
> a SELECT, then why can Postgres not do the same (or similar)?
>

Here is what has worked for me many times:

1. create the check constraint as NOT VALID. From now on no new or updated
row is allowed to violate it.
2. check if the constraint holds with a query on a binary replica. Make
sure the query starts only when the constraint is visible on the replica.
3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE
conname='your_constraint_name' AND conrelid='schema.table'::regclass

How you perform step 2 is irrelevant. Checking it on a replica would simply
avoid the load on the master. You just need to make sure there is no
conflicting data in the table.

WARNING, you need to be damn sure of your data if you do that. But if you
are, it works.

Here is the procedure how I solved the same problem for some of our
multi-TB tables (PG14):

The table has a column called transaction_time. We wanted to partition by
that column. For some historical reason the column did not have a NOT NULL
constraint. However, there was no way our processes could insert NULL in
that column and there was no row with NULL in that field. So, first was to
add the NOT NULL constraint:

BEGIN;

ALTER TABLE my.table
ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT
NULL) NOT VALID;

UPDATE pg_constraint
SET convalidated=true
WHERE conname = 'transaction_time_not_null'
AND conrelid = 'my.table'::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;

COMMIT;

Now for cosmetic purposes we first turn the check constraint above into a
normal NOT NULL constraint:

BEGIN;

SET LOCAL client_min_messages = 'debug4';
-- expecting this message
-- DEBUG: existing constraints on column "table.transaction_time" are
sufficient to prove that it does not contain nulls
ALTER TABLE my.table
ALTER COLUMN transaction_time SET NOT NULL;
RESET client_min_messages;

ALTER TABLE my.table
DROP CONSTRAINT transaction_time_not_null;

COMMIT;

If you set client_min_messages to something like debug4, then the database
tells you if it wants to scan the table or if existing constraints are
sufficient to prove the condition.

transaction_time in our case is never in the future. Also database
transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust
the effect of any action performed more than 30 seconds ago in the database
is visible.

So, I set the time after which new rows go to the new partition at least 10
minutes from now at the next hour boundary. 30 seconds would be good
enough. I chose 10 minutes just for extra safety.

SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS
switch_time\gset

Next comes the actual change:

BEGIN;

-- rename the existing table
ALTER TABLE my.table RENAME TO table_old;

-- drop triggers. We will recreate them later.
DROP TRIGGER ... ON my.table_old;
DROP TRIGGER ...;

-- create partitioned table
CREATE TABLE my.table (
LIKE my.table_old
INCLUDING DEFAULTS
INCLUDING CONSTRAINTS
)
PARTITION BY RANGE (transaction_time);

-- recreate triggers
CREATE TRIGGER ...
BEFORE DELETE ON my.table
FOR EACH ROW EXECUTE FUNCTION my,trigger_fun();
CREATE TRIGGER ...;

-- create the partition for future rows
CREATE TABLE my.table_current PARTITION OF my.table
FOR VALUES FROM (:'switch_time') TO ('infinity');

-- and some indexes and FK
ALTER TABLE my.table_current ADD PRIMARY KEY (id);
CREATE INDEX ON my.table_current (...);
ALTER TABLE my.table_current
ADD CONSTRAINT fk_name
FOREIGN KEY (...)
REFERENCES ...(...) ON UPDATE ... ON DELETE ...;

-- now attach the old table as partition
ALTER TABLE my.table_old
ADD CONSTRAINT partition_boundaries
CHECK('-infinity'<=transaction_time AND transaction_time<:'switch_time')
NOT VALID;

-- for procedural reasons we know the constraint is valid. Let's make PG
believe it too.
UPDATE pg_constraint
SET convalidated=true
WHERE conname='partition_boundaries'
AND conrelid='my.table_old'::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;

-- now attach it. We use again debug4 to check if the table scan is avoided.
SET LOCAL client_min_messages = 'debug4';
-- expecting
-- DEBUG: partition constraint for table "table_old" is implied by
existing constraints
ALTER TABLE my.table ATTACH PARTITION my.table_old
FOR VALUES FROM ('-infinity') TO (:'switch_time');
RESET client_min_messages;

-- drop the now unnecessary constraint
ALTER TABLE my.table_old
DROP CONSTRAINT partition_boundaries;

COMMIT;

Once the new partition gets too full, we will use a similar procedure to
adjust the partition boundary of the new partition and then create the next
partition.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-11-15 15:47:41 Re: DB Files
Previous Message Andy Hartman 2024-11-15 14:27:36 DB Files