From: | Tony Cebzanov <tonyceb(at)andrew(dot)cmu(dot)edu> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Performance problem with row count trigger |
Date: | 2009-04-02 14:22:50 |
Message-ID: | 49D4CA3A.3050005@andrew.cmu.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
I was looking to speed up a count(*) query, as per the recommendations
on the postgres wiki:
http://wiki.postgresql.org/wiki/Slow_Counting
I decided to use the trigger approach to get an accurate count that
doesn't depend on VACUUM being run recently. I've got it working, but
the addition of the trigger slows things down so bad that it's not a
viable option. I was hoping for advice on how to speed things up, or at
least an explanation of why it gets so slow.
The relevant tables are as follows:
---------------------------------------------------------------------------
CREATE TABLE dataset(
dataset_id SERIAL PRIMARY KEY,
catalog_id INTEGER REFERENCES catalog (catalog_id) ON DELETE CASCADE,
t_begin TIMESTAMP WITHOUT TIME ZONE NULL,
t_end TIMESTAMP WITHOUT TIME ZONE NULL,
"ctime" TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT now(),
"mtime" TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT now(),
"assoc_count" BIGINT NOT NULL DEFAULT 0
)
CREATE TABLE assoc (
dataset_id INTEGER REFERENCES dataset (dataset_id) ON DELETE CASCADE,
range ip4r NOT NULL,
label_id INTEGER NULL,
value BIGINT NULL,
PRIMARY KEY (dataset_id, range),
UNIQUE (dataset_id, range, label_id)
);
---------------------------------------------------------------------------
What I want to do is update the assoc_count field in the dataset table
to reflect the count of related records in the assoc field. To do so, I
added the following trigger:
---------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION update_assoc_count_insert()
RETURNS TRIGGER AS
'
BEGIN
UPDATE dataset
SET assoc_count = assoc_count + 1
WHERE dataset_id = NEW.dataset_id;
RETURN NEW;
END
' LANGUAGE plpgsql;
CREATE TRIGGER assoc_update_assoc_count_insert AFTER INSERT ON assoc
FOR EACH ROW EXECUTE PROCEDURE update_assoc_count_insert();
---------------------------------------------------------------------------
(I also have triggers for UPDATE/DELETE, left out for brevity.)
The slowness I'm talking about doesn't show up for a single insert, but
arises when doing thousands of them in a transaction. Here are some
test runs of 10,000 inserts without the trigger in place:
---------------------------------------------------------------------------
1000 (2231.540142/s)
2000 (2341.849077/s)
3000 (2234.332303/s)
4000 (2311.247629/s)
5000 (2366.171695/s)
6000 (2400.028800/s)
7000 (2407.147716/s)
8000 (2416.419084/s)
9000 (2401.476107/s)
10000 (2406.870943/s)
---------------------------------------------------------------------------
The number in parens is the number of inserts per second for each batch
of 1,000 inserts. As you can see, performance isn't too terrible, and
is pretty constant from start to finish.
Now I add the trigger, and here's what happens:
---------------------------------------------------------------------------
1000 (1723.216901/s)
2000 (1613.529119/s)
3000 (1526.081496/s)
4000 (1431.907261/s)
5000 (1340.159570/s)
6000 (1269.746140/s)
7000 (1191.374990/s)
8000 (1117.332012/s)
9000 (1056.309389/s)
10000 (1001.051003/s)
---------------------------------------------------------------------------
The throughput of the first batch of 1,000 is diminished, but still
tolerable, but after 10,000 inserts, it's gotten much worse. This
pattern continues, to the point where performance is unacceptable after
20k or 30k inserts.
To rule out the performance of the trigger mechanism itself, I swapped
the trigger out for one that does nothing. The results were the same as
without the trigger (the first set of numbers), which leads me to
believe there's something about the UPDATE statement in the trigger that
is causing this behavior.
I then tried setting the assoc_count to a constant number instead of
trying to increment it with assoc_count = assoc_count + 1, but
performance was just as bad as with the proper UPDATE statement.
I'm not really sure where to go next. I can update the assoc_count once
at the end of the transaction (without a trigger), but this could lead
to inconsistencies if another client does inserts without updating the
count. I would really prefer to use the trigger solution recommended on
the PGsql wiki, but can't do so until I solve this performance problem.
I greatly appreciate any and all help. Thanks.
-Tony
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-04-02 15:00:42 | Re: SQL to Check whether "AN HOUR PERIOD" is between start and end timestamps |
Previous Message | Emi Lu | 2009-04-02 14:01:53 | ibatis with overlaps query |