Quick Links

atrocious update performance

From:	"Rosser Schwarz" <rschwarz(at)totalcardinc(dot)com>
To:	<pgsql-performance(at)postgresql(dot)org>
Subject:	atrocious update performance
Date:	2004-03-15 20:28:46
Message-ID:	001401c40acc$1e4e7180$2500fa0a@CardServices.TCI.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

We're in the throes of an MS SQL to PostgreSQL migration; our databases
include a number of ~5M row tables. We decided to take this opportunity
to clean up and slightly re-normalize our schemas, given what we've
learned about the data over its lifetime and such, else we wouldn't be
experiencing any of the following (we could instead just dump and `copy
from`).

We have a temporary table, public.tempprod, containing 4.7M rows, one
for each row in account.cust. account.cust has, among others, two
columns, prod and subprod, which we're trying to update from tempprod
joined against prod. The update tends to take unnecessarily long--
rather, we've had to finally kill it after its taking obscenely too
long.

The table:

We're currently running on a dual Xeon 700 (I know, I know; it's what
we've got) with 2.5GB RAM and 4x36GB SCSI in hardware RAID 5 (Dell
Perc3 something-or-other controller). If we can demonstrate that
PostgreSQL will meet our needs, we'll be going production on a dual
Opteron, maxed memory, with a 12-disk Fibre Channel array.

The query is:

update account.cust set prodid =
(select p.prodid from account.prod p
join public.tempprod t on t.pool = p.origid
where custid = t.did)

And then, upon its completion, s/prod/subprod/.

That shouldn't run overnight, should it, let alone for -days-?

In experimenting with ways of making the updates take less time, we tried
adding product and subproduct columns to tempprod, and updating those.
That seemed to work marginally better:

explain analyze update public.tempprod set prodid =
(select account.prod.prodid::bigint
from account.prod
where public.tempprod.pool::text = account.prod.origid::text)

Seq Scan on tempprod (cost=0.00..9637101.35 rows 4731410 width=56) (actual
time=24273.467..16090470.438 rows=4731410 loops=1)
SubPlan
-> Limit (cost=0.00..2.02 rows=2 width=8) (actual time=0.134..0.315
rows=1 loops=4731410)
-> Seq Scan on prod (cost=0.00..2.02 rows=2 width=8) (actual
time=0.126..0.305 rows=1 loops=4731410)
Filter: (($0)::text = (origid)::text)
Total runtime: 2284551.962 ms

But then going from public.tempprod to account.cust again takes days. I
just cancelled an update that's been running since last Thursday.
Alas, given how long the queries take to run, I can't supply an `explain
analyze`. The `explain` seems reasonable enough:

# explain update account.cust set prodid = tempprod.prodid
where tempprod.did = origid;

Merge Join (cost=0.00..232764.69 rows=4731410 width=252)
Merge Cond: (("outer".origid)::text = ("inner".did)::text)
-> Index Scan using ix_origid on cust (cost=0.00..94876.83
rows=4731410 width=244)
-> Index Scan using ix_did on tempprod (cost=0.00..66916.71
rows=4731410 width=18)

The relevant bits from my postgreql.conf (note, we built with a BLCKSZ
of 16K):

shared_buffers = 4096
sort_mem = 32768
vacuum_mem = 32768
wal_buffers = 16384
checkpoint_segments = 64
checkpoint_timeout = 1800
checkpoint_warning = 30
commit_delay = 50000
effective_cache_size = 131072

Any advice, suggestions or comments of the "You bleeding idiot, why do
you have frob set to x?!" sort welcome. Unfortunately, if we can't
improve this, significantly, the powers what be will probably pass
on PostgreSQL, even though everything we've done so far--with this
marked exception--performs pretty spectacularly, all told.

/rls

--
Rosser Schwarz
Total Card, Inc.

Responses

Re: atrocious update performance at 2004-03-15 21:06:33 from Rod Taylor
Re: atrocious update performance at 2004-03-15 21:29:51 from Aaron Werman
Re: atrocious update performance at 2004-03-16 07:38:49 from Shridhar Daithankar

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2004-03-15 20:34:31	Re: Large CASE-statement is pretty slow?
Previous Message	Matt Davies	2004-03-15 20:28:35	Re: Scaling further up