From: | Philip Couling <couling(at)gmail(dot)com> |
---|---|
To: | Torsten Förtsch <tfoertsch123(at)gmail(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Validating check constraints without a table scan? |
Date: | 2024-11-27 12:05:59 |
Message-ID: | CANWftz+7+Jt459UC3dV=d0tExtCacDr1KkjCmF7FmSpPamwgYA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
It looks like updating pg_constraint isn't an option for AWS RDS due to the
way AWS doesn't give you superuser access.
Thanks a lot for the suggestion anyway.
On Fri, 15 Nov 2024 at 15:28, Torsten Förtsch <tfoertsch123(at)gmail(dot)com>
wrote:
> On Fri, Nov 15, 2024 at 9:38 AM Philip Couling <couling(at)gmail(dot)com> wrote:
>
>> Is there a solid reason why adding a check constraint does not use
>> existing indexes for validation.
>>
>> We are currently looking at partitioning a multi TB table leaving all
>> existing data in place and simply attaching it as a partition to a new
>> table. To prevent locking, we are trying to add an INVALID check constraint
>> first and then validate it.
>>
>> I can trivially prove the invalid constraint is valid with a simple
>> SELECT which will use an existing index and return instantaneously. But
>> AFAIK Theres no way to mark a constraint as valid without scanning all the
>> rows.
>>
>
> Most likely your query is not exactly the same as the check constraint.
> Think about NULL and similar.
>
>
>> This operation is really problematic on a production database with heavy
>> IO load.
>>
>> Is there a solid ready why validating check constraints cannot use
>> existing indexes? If I can prove the constraint is valid so trivially with
>> a SELECT, then why can Postgres not do the same (or similar)?
>>
>
> Here is what has worked for me many times:
>
> 1. create the check constraint as NOT VALID. From now on no new or updated
> row is allowed to violate it.
> 2. check if the constraint holds with a query on a binary replica. Make
> sure the query starts only when the constraint is visible on the replica.
> 3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE
> conname='your_constraint_name' AND conrelid='schema.table'::regclass
>
> How you perform step 2 is irrelevant. Checking it on a replica would
> simply avoid the load on the master. You just need to make sure there is no
> conflicting data in the table.
>
> WARNING, you need to be damn sure of your data if you do that. But if you
> are, it works.
>
> Here is the procedure how I solved the same problem for some of our
> multi-TB tables (PG14):
>
> The table has a column called transaction_time. We wanted to partition by
> that column. For some historical reason the column did not have a NOT NULL
> constraint. However, there was no way our processes could insert NULL in
> that column and there was no row with NULL in that field. So, first was to
> add the NOT NULL constraint:
>
> BEGIN;
>
> ALTER TABLE my.table
> ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT
> NULL) NOT VALID;
>
> UPDATE pg_constraint
> SET convalidated=true
> WHERE conname = 'transaction_time_not_null'
> AND conrelid = 'my.table'::REGCLASS
> RETURNING conname, conrelid::REGCLASS, convalidated;
>
> COMMIT;
>
> Now for cosmetic purposes we first turn the check constraint above into a
> normal NOT NULL constraint:
>
> BEGIN;
>
> SET LOCAL client_min_messages = 'debug4';
> -- expecting this message
> -- DEBUG: existing constraints on column "table.transaction_time" are
> sufficient to prove that it does not contain nulls
> ALTER TABLE my.table
> ALTER COLUMN transaction_time SET NOT NULL;
> RESET client_min_messages;
>
> ALTER TABLE my.table
> DROP CONSTRAINT transaction_time_not_null;
>
> COMMIT;
>
> If you set client_min_messages to something like debug4, then the database
> tells you if it wants to scan the table or if existing constraints are
> sufficient to prove the condition.
>
> transaction_time in our case is never in the future. Also database
> transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust
> the effect of any action performed more than 30 seconds ago in the database
> is visible.
>
> So, I set the time after which new rows go to the new partition at least
> 10 minutes from now at the next hour boundary. 30 seconds would be good
> enough. I chose 10 minutes just for extra safety.
>
> SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS
> switch_time\gset
>
> Next comes the actual change:
>
> BEGIN;
>
> -- rename the existing table
> ALTER TABLE my.table RENAME TO table_old;
>
> -- drop triggers. We will recreate them later.
> DROP TRIGGER ... ON my.table_old;
> DROP TRIGGER ...;
>
> -- create partitioned table
> CREATE TABLE my.table (
> LIKE my.table_old
> INCLUDING DEFAULTS
> INCLUDING CONSTRAINTS
> )
> PARTITION BY RANGE (transaction_time);
>
> -- recreate triggers
> CREATE TRIGGER ...
> BEFORE DELETE ON my.table
> FOR EACH ROW EXECUTE FUNCTION my,trigger_fun();
> CREATE TRIGGER ...;
>
> -- create the partition for future rows
> CREATE TABLE my.table_current PARTITION OF my.table
> FOR VALUES FROM (:'switch_time') TO ('infinity');
>
> -- and some indexes and FK
> ALTER TABLE my.table_current ADD PRIMARY KEY (id);
> CREATE INDEX ON my.table_current (...);
> ALTER TABLE my.table_current
> ADD CONSTRAINT fk_name
> FOREIGN KEY (...)
> REFERENCES ...(...) ON UPDATE ... ON DELETE ...;
>
> -- now attach the old table as partition
> ALTER TABLE my.table_old
> ADD CONSTRAINT partition_boundaries
> CHECK('-infinity'<=transaction_time AND transaction_time<:'switch_time')
> NOT VALID;
>
> -- for procedural reasons we know the constraint is valid. Let's make PG
> believe it too.
> UPDATE pg_constraint
> SET convalidated=true
> WHERE conname='partition_boundaries'
> AND conrelid='my.table_old'::REGCLASS
> RETURNING conname, conrelid::REGCLASS, convalidated;
>
> -- now attach it. We use again debug4 to check if the table scan is
> avoided.
> SET LOCAL client_min_messages = 'debug4';
> -- expecting
> -- DEBUG: partition constraint for table "table_old" is implied by
> existing constraints
> ALTER TABLE my.table ATTACH PARTITION my.table_old
> FOR VALUES FROM ('-infinity') TO (:'switch_time');
> RESET client_min_messages;
>
> -- drop the now unnecessary constraint
> ALTER TABLE my.table_old
> DROP CONSTRAINT partition_boundaries;
>
> COMMIT;
>
> Once the new partition gets too full, we will use a similar procedure to
> adjust the partition boundary of the new partition and then create the next
> partition.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Raphael Salguero Aragón | 2024-11-27 14:14:04 | Re: Customize psql prompt to show current_role |
Previous Message | Paul Foerster | 2024-11-27 07:22:49 | Re: PostgreSQL 15.10 update corrective action for ATTACH PARTITION/DETACH PARTITION |