From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL |
Date: | 2020-11-02 12:08:19 |
Message-ID: | CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
here is another patch related to using CALL statement inside PL/pgSQL code.
A repeated using of CALL statement is expensive. How much?
I wrote synthetic test:
CREATE TABLE foo(a int, b int, c int);
CREATE OR REPLACE PROCEDURE public.simple_proc3(a integer, b integer, c
integer, cnt int, OUT r boolean)
AS $$
BEGIN
INSERT INTO foo VALUES(a, b, c);
IF cnt % 10000 = 0 THEN
COMMIT;
r := true;
ELSE
r := false;
END IF;
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
TRUNCATE foo;
FOR i IN 1..10000000
LOOP
a := (random() * 100)::int;
b := (random() * 100)::int;
c := (random() * 100)::int;
CALL simple_proc3(a, b, c, i, r);
IF r THEN
RAISE NOTICE 'committed at % row', i;
END IF;
END LOOP;
END;
$$;
I try to insert 10M rows with commit after inserting 10K rows. Execution
time on master is ~ 6 minutes 368251,691 ms (06:08,252)
DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
TRUNCATE foo;
FOR i IN 1..10000000
LOOP
a := (random() * 100)::int;
b := (random() * 100)::int;
c := (random() * 100)::int;
INSERT INTO foo VALUES(a, b, c);
IF i % 10000 = 0 THEN
COMMIT;
r := true;
ELSE
r := false;
END IF;
IF r THEN
RAISE NOTICE 'committed at % row', i;
END IF;
END LOOP;
END;
$$;
When I try to remove CALL statement then same work needs less to 2 minutes
99109,511 ms (01:39,110). So this code is three times slower with calling
one procedure. There are two significant parts of overhead:
a) internal implementation of CALL statement that doesn't use plan cache
well, and it does lot of expensive operations over pg_proc catalogue,
b) wrapper in PL/pgSQL that repeatedly repearse expression string.
Overhead of PL/pgSQL can be reduced by using plan cache after fixing issue
with resource owner. I did it, and I introduced "local_resowner" for
holding references of plans for CALL statement expressions.
After patching the execution time is reduced to 4 minutes Time: 245852,846
ms (04:05,853). Still the overhead is significant, but it is 30% speedup.
The best case for this patch is about 2x times better performance
CREATE OR REPLACE PROCEDURE public.simple_proc2(a integer, b integer, c
integer, cnt int, OUT r boolean)
AS $$
BEGIN
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE a int; r boolean;
BEGIN
FOR i IN 1..10000000
LOOP
CALL simple_proc2((random()*100)::int, (random()*100)::int,
(random()*100)::int, i, r);
END LOOP;
END;
$$;
Time: 184667,970 ms (03:04,668), master: Time: 417463,457 ms (06:57,463)
On second hand, the worst case is about 10% (probably this overhead can be
reduced by creating "local_resowner" only when it is necessary)
CREATE OR REPLACE FUNCTION simple_fx2(a int)
RETURNS int AS $$
BEGIN
RETURN a + a;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
DO $$
DECLARE a int;
BEGIN
FOR i IN 1..10000000
LOOP
a := simple_fx2(i);
END LOOP;
END;
$$;
Time: 5434,808 ms (00:05,435) , master: Time: 4632,762 ms (00:04,633)
Comments, notes, ideas?
Regards
Pavel
Attachment | Content-Type | Size |
---|---|---|
plpgsql-using-local-resowner-for-call-plans.patch | text/x-patch | 31.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Rijkers | 2020-11-02 12:24:12 | Re: Split copy.c |
Previous Message | Heikki Linnakangas | 2020-11-02 11:52:48 | Re: hash_array_extended() needs to pass down collation |