Update performance ... is 200, 000 updates per hour what I should expect?

From: Erik Norvelle <erik(at)norvelle(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Update performance ... is 200, 000 updates per hour what I should expect?
Date: 2003-12-02 15:53:16
Message-ID: A461D6B8-24DF-11D8-BDFB-000A9583BF06@norvelle.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Folks:

I´m running a query which is designed to generate a foreign key for a
table of approx. 10 million records (I've mentioned this in an earlier
posting). The table is called "indethom", and each row contains a
single word from the works of St. Thomas Aquinas, along with
grammatical data about the word form, and (most importantly for my
current problem) a set of columns identifying the particular
work/section/paragraph that the word appears in.

This database is completely non-normalized, and I'm working on
performing some basic normalization, beginning with creating a table
called "s2.sectiones" which (naturally) contains a complete listing of
all of the sections of all the works of St. Thomas. I will then
eliminate this information from the original "indethom" table,
replacing it with the foreign key I am currently generating.

** My question has to do with whether or not I am getting maximal speed
out of PostgreSQL, or whether I need to perform further optimizations.
I am currently getting about 200,000 updates per hour, and updating the
entire 10 million rows thus requires 50 hours, which seems a bit much.

Here's the query I am running:
update indethom
set query_counter = nextval('s2.query_counter_seq'), -- Just
for keeping track of how fast the query is running
sectref = (select clavis from s2.sectiones where
s2.sectiones.nomeoper = indethom.nomeoper
and s2.sectiones.refere1a = indethom.refere1a and
s2.sectiones.refere1b = indethom.refere1b
and s2.sectiones.refere2a = indethom.refere2a and
s2.sectiones.refere2b = indethom.refere2b
and s2.sectiones.refere3a = indethom.refere3a and
s2.sectiones.refere3b = indethom.refere3b
and s2.sectiones.refere4a = indethom.refere4a and
s2.sectiones.refere4b = indethom.refere4b);

Here´s the query plan:
QUERY PLAN
------------------------------------------------------------------------
-------------
Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212)
SubPlan
-> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03
rows=1 width=4)
Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND
(refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a =
$5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
(4 rows)

Note: I have just performed a VACUUM ANALYZE on the indethom table, as
suggested by this listserve.

Here's the structure of the s2.sectiones table:
it=> \d s2.sectiones
Table "s2.sectiones"
Column | Type | Modifiers
----------+--------------+-----------
nomeoper | character(3) |
refere1a | character(2) |
refere1b | character(2) |
refere2a | character(2) |
refere2b | character(2) |
refere3a | character(2) |
refere3b | character(2) |
refere4a | character(2) |
refere4b | character(2) |
clavis | integer |
Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a,
refere2b, refere3a, refere3b, refere4a, refere4b)

Finally, here is the structure of indethom (some non-relevant columns
not shown):
it=> \d indethom
Table "public.indethom"
Column | Type | Modifiers
---------------+-----------------------+-----------
numeoper | smallint | not null
nomeoper | character(3) | not null
editcrit | character(1) |
refere1a | character(2) |
refere1b | character(2) |
refere2a | character(2) |
refere2b | character(2) |
refere3a | character(2) |
refere3b | character(2) |
refere4a | character(2) |
refere4b | character(2) |
refere5a | character(2) | not null
refere5b | smallint | not null
referen6 | smallint | not null
... several columns skipped ...
verbum | character varying(22) | not null
... other columns skipped ...
poslinop | integer | not null
posverli | smallint | not null
posverop | integer | not null
clavis | integer | not null
articref | integer |
sectref | integer |
query_counter | integer |
Indexes: indethom_pkey primary key btree (clavis),
indethom_articulus_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b),
indethom_sectio_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),
verbum_ndx btree (verbum)

Thanks for your assistance!
-Erik Norvelle

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Szabo 2003-12-02 16:29:15 Re: Update performance ... is 200,000 updates per hour
Previous Message Tom Lane 2003-12-02 00:45:29 Re: A question on the query planner