From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Partitioning performance: cache stringToNode() of pg_constraint.ccbin |
Date: | 2013-06-03 19:07:27 |
Message-ID: | 20130603190727.GA360354@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
A colleague, Korry Douglas, observed a table partitioning scenario where
deserializing pg_constraint.ccbin is a hot spot. The following test case, a
simplification of a typical partitioning setup, spends 28% of its time in
stringToNode() and callees thereof:
\timing on
\set n 600000
BEGIN;
CREATE TABLE bench_check_constr_parent (c int);
CREATE TABLE bench_check_constr_child (
CHECK (c > 0 AND c <= 100000000)
) INHERITS (bench_check_constr_parent);
CREATE FUNCTION trig() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO bench_check_constr_child VALUES (NEW.*);
RETURN NULL;
END
$$;
CREATE TRIGGER redir BEFORE INSERT ON bench_check_constr_parent
FOR EACH ROW EXECUTE PROCEDURE trig();
-- Main benchmark
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;
-- Compare direct insert performance @ 10x volume
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;
ROLLBACK;
The executor caches each CHECK constraint in ResultRelInfo as a planned
expression. That cache is highly effectively for long-running statements, but
the trivial INSERTs effectively work without a cache. Korry devised this
patch to cache the stringToNode() form of the constraint in the relcache. It
improves the benchmark's partitioned scenario by 33%:
-- Timings (seconds) --
master, INSERT parent: 14.2, 14.4, 14.4
patched, INSERT parent: 9.6, 9.7, 9.7
master, INSERT*10 child: 9.9, 9.9, 10.2
patched, INSERT*10 child: 10.0, 10.2, 10.2
There's still not much to like about that tenfold overhead from use of the
partition routing trigger, but this patch makes a nice cut into that overhead
without doing anything aggressive. The profile no longer shows low-hanging
fruit; running an entire SQL statement per row piles on the runtime from a
wide range of sources. For anyone curious, I've attached output from "perf
report -s parent -g graph,1,caller" with the patch applied; I suggest browsing
under "less -S".
Some call sites need to modify the node tree, so the patch has them do
copyObject(). I ran a microbenchmark of copyObject() on the cached node tree
vs. redoing stringToNode(), and copyObject() still won by a factor of four.
Thanks,
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
checkconstr-cache-v1.patch | text/plain | 7.5 KB |
partition-trigger-cost.txt | text/plain | 56.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2013-06-03 19:48:26 | Re: Vacuum, Freeze and Analyze: the big picture |
Previous Message | Simon Riggs | 2013-06-03 19:00:19 | Re: Optimising Foreign Key checks |