More efficient RI checks - take 2

From: Antonin Houska <ah(at)cybertec(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: More efficient RI checks - take 2
Date: 2020-04-08 16:38:01
Message-ID: 1813.1586363881@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After having reviewed [1] more than a year ago (the problem I found was that
the transient table is not available for deferred constraints), I've tried to
implement the same in an alternative way. The RI triggers still work as row
level triggers, but if multiple events of the same kind appear in the queue,
they are all passed to the trigger function at once. Thus the check query does
not have to be executed that frequently.

Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In general,
the checks are significantly faster if there are many rows to process, and a
bit slower when we only need to check a single row. However I'm not sure about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).

Comments are welcome.

Setup
=====

CREATE TABLE p(i int primary key);
INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
CREATE TABLE f(i int REFERENCES p);

Insert many rows into the FK table
==================================

master:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=32.741..32.741 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.403..4.802 rows=16384 loops=1)
Planning Time: 0.050 ms
Trigger for constraint f_i_fkey: time=448.986 calls=16384
Execution Time: 485.444 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=34.053..34.053 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.223..4.448 rows=16384 loops=1)
Planning Time: 0.047 ms
Trigger for constraint f_i_fkey: time=105.164 calls=8
Execution Time: 141.201 ms

Insert a single row into the FK table
=====================================

master:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.026 ms
Trigger for constraint f_i_fkey: time=0.435 calls=1
Execution Time: 0.517 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.025 ms
Trigger for constraint f_i_fkey: time=0.578 calls=1
Execution Time: 0.670 ms

Check if FK row exists during deletion from the PK
==================================================

master:

DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 3.381 ms

patched:

DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 5.561 ms

Cascaded DELETE --- many PK rows
================================

DROP TABLE f;
CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=38.334..38.334 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.019..3.925 rows=16384 loops=1)
Planning Time: 0.049 ms
Trigger for constraint f_i_fkey: time=31348.756 calls=16384
Execution Time: 31390.784 ms

patched:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=33.360..33.360 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.012..3.183 rows=16384 loops=1)
Planning Time: 0.094 ms
Trigger for constraint f_i_fkey: time=9.580 calls=8
Execution Time: 43.941 ms

Cascaded DELETE --- a single PK row
===================================

INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

DELETE FROM p WHERE i=16384;
DELETE 1
Time: 5.754 ms

patched:

DELETE FROM p WHERE i=16384;
DELETE 1
Time: 8.098 ms

Cascaded UPDATE - many rows
===========================

master:

EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.80 rows=16384 width=10) (actual time=166.954..166.954 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.80 rows=16384 width=10) (actual time=0.013..7.780 rows=16384 loops=1)
Planning Time: 0.177 ms
Trigger for constraint f_i_fkey on p: time=60405.362 calls=16384
Trigger for constraint f_i_fkey on f: time=455.874 calls=16384
Execution Time: 61036.996 ms

patched:

EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.77 rows=16382 width=10) (actual time=159.512..159.512 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.77 rows=16382 width=10) (actual time=0.014..7.783 rows=16382 loops=1)
Planning Time: 0.146 ms
Trigger for constraint f_i_fkey on p: time=169.628 calls=9
Trigger for constraint f_i_fkey on f: time=124.079 calls=2
Execution Time: 456.072 ms

Cascaded UPDATE - a single row
==============================

master:

UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 4.858 ms

patched:

UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 11.955 ms

[1] https://commitfest.postgresql.org/22/1975/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
v01-0001-Check-for-RI-violation-outside-ri_PerformCheck.patch text/x-diff 2.4 KB
v01-0002-Changed-ri_GenerateQual-so-it-generates-the-whole-qu.patch text/x-diff 13.8 KB
v01-0003-Return-early-from-ri_NullCheck-if-possible.patch text/x-diff 1011 bytes
v01-0004-Process-multiple-RI-trigger-events-at-a-time.patch text/x-diff 79.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-08 16:39:53 Re: Commitfest 2020-03 Now in Progress
Previous Message David Steele 2020-04-08 16:36:37 Re: Commitfest 2020-03 Now in Progress