From: | Ben Liblit <liblit(at)eecs(dot)berkeley(dot)edu> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | huge performance penalty from constraint triggers |
Date: | 2002-08-02 01:52:41 |
Message-ID: | 3D49E5E9.5070903@eecs.berkeley.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have a modest-sized PostgreSQL database, with about four thousand
records across three tables. I am seeing an astonishing variance in the
time it takes to initially populate the database versus the time it
takes to restore it from a standard pg_dump archive. Initial population
takes nineteen hours, while restoring the dump takes a mere twenty one
seconds!
The bulk data is the same in both cases. I am using the COPY command
when initially populating the database, as does the pg_dump archive.
The key difference appears to be when CONSTRAINT TRIGGERs are created.
When I initially populate the database, I create one sequence and three
tables. Those tables have various primary keys, foreign keys,
references, and uniqueness constraints, from which PostgreSQL
immediately induces several indexes and constraint triggers. I then use
three COPY commands to populate the tables. Thus the COPY happens
*after* the indexes and triggers have been created.
The pg_dump archive orders things differently. It first creates the
sequence and tables, then has three COPY commands, and lastly creates
the indexes and triggers. Restoring from this dump takes twenty one
seconds. If I reorder things manually so that the "CREATE UNIQUE INDEX"
commands are before the COPYs, restoration time grows slightly, to
twenty five seconds. But if I move the "CREATE CONSTRAINT TRIGGER"
commands before the COPYs as well ... boom! Nineteen hours.
Obviously, putting the triggers after the data means less work, because
presumably those triggers will not be run for the bulk data. But twenty
one seconds versus nineteen *hours*? That's a slowdown of more than
three orders of magnitude! A crude calculation of records inserted per
second gives us nineteen thousand records per second without triggers,
or a bit less than six per second with triggers.
I'm already using COPY instead of many INSERTs. I am aware of the
performance tip regarding dropping indexes before uploading large
amounts of data, and then regenerating the indexes after. But in this
case, index maintenance appears to be a relatively small factor, while
the triggers are killing me. Constraint triggers are sparsely
documented, and what documentation exists states that they are "not
intended for general use". So I'm not sure if dropping and recreating
the constraint triggers is advisable. It also defeats the purpose of
having them, of course, since they will not be present to maintain
referential integrity as new data is added to the system.
Attached below please find the SQL commands I use to create the tables
(and implicitly, the unique indexes and constraint triggers). We have
three tables. In brief, each session has a large collection of numbered
sites, and each site has a handful of samples. So each row in the
"sessions" table has an ID. Each row in the "sites" table gives a
session ID and its own sample number within that session. Each row in
the "samples" table cites a session ID and sample number. In my current
test, the "sessions" table has one row; the "sites" table has 157,000
rows; and the "samples" table has 244,000 rows.
I'd be most grateful if anyone has suggestions for how to improve
things. I'm completely baffled.
Attachment | Content-Type | Size |
---|---|---|
build-tables.sql | text/plain | 814 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2002-08-02 02:01:02 | varchar truncation from 7.1 to 7.2 |
Previous Message | Manuel Cano Muñoz | 2002-08-02 01:33:19 | Re: Referential integrity doesn't work? (Thanks a lot) |