Re: atrocious update performance

From: "Rosser Schwarz" <rschwarz(at)totalcardinc(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: atrocious update performance
Date: 2004-03-16 19:58:47
Message-ID: 001e01c40b91$181d66c0$2500fa0a@CardServices.TCI.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

while you weren't looking, Tom Lane wrote:

> ...slow FK checks could be your problem if the application is set
> up to issue multiple UPDATEs affecting the same row(s) during a
> single transaction. I'm not clear on whether that applies to you
> or not.

It shouldn't. It's just one large batch update that should be hitting
every row serially.

> And anyway the bottom line is: have you got indexes on the columns
> *referencing* account.cust.custid?

No. I'd've sworn I had one on account.acct.custid, since that table
is popupated (currently ~500K rows), but it's not.

$ time psql tci -c "explain analyze select * from account.acct where
custid = 257458"
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on acct (cost=0.00..7166.68 rows=2 width=71) (actual
time=1047.122..1047.122 rows=0 loops=1)
Filter: (custid = 257458)
Total runtime: 1047.362 ms
(3 rows)

real 0m1.083s
user 0m0.010s
sys 0m0.000s

If it is looking up the custid in account.acct for each row, that's,
say, 1 seconds per lookup, for 4.7 million lookups, for, if my math
is right (4,731,410 / 3600 / 24) 54 days. I suppose that tracks, but
that doesn't make sense, given what you said about the fk checks,
above.

Of course, if I index the column and amend the query to say "where
custid = 194752::bigint" I get back much saner numbers:

QUERY PLAN
----------------------------------------------------------------------
Index Scan using ix_fk_acct_custid on acct (cost=0.00..3.34 rows=2
width=71) (actual time=0.126..0.141 rows=2 loops=1)
Index Cond: (custid = 194752::bigint)
Total runtime: 0.314 ms
(3 rows)

real 0m0.036s
user 0m0.010s
sys 0m0.000s

Which would still take just under two days.

$ time psql tci -c "explain analyze update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid"

But if I'm not touching the column referenced from account.acct, why
would it be looking there at all? I've got an explain analyze of the
update running now, but until it finishes, I can't say for certain
what it's doing. explain, alone, says:

$ time psql tci -c "explain update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid;"
QUERY PLAN
---------------------------------------------------------------------
Merge Join (cost=0.00..232764.69 rows=4731410 width=252)
Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
-> Index Scan using ix_origid on cust (cost=0.00..94876.83
rows=4731410 width=236)
-> Index Scan using ix_debtid on tempprod (cost=0.00..66916.71
rows=4731410 width=26)
(4 rows)

real 0m26.965s
user 0m0.010s
sys 0m0.000s

which shows it not hitting account.acct at all. (And why did it take
the planner 20-some seconds to come up with that query plan?)

tempprod doesn't have an index either, but then it doesn't reference
account.cust; instead, the update would be done by joining the two on
debtid/origid, which map one-to-one, are both indexed, and with both
tables clustered on those indices--exactly as was the CREATE TABLE AS
Aaron suggested elsethread.

Unfortunately, this isn't the only large update we'll have to do. We
receive a daily, ~100K rows file that may have new values for any field
of any row in account.cust, .acct or sundry other tables. The process
of updating from that file is time-critical; it must run in minutes, at
the outside.

/rls

--
Rosser Schwarz
Total Card, Inc.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-03-16 20:14:46 Re: atrocious update performance
Previous Message Darcy Buskermolen 2004-03-16 18:54:35 Fwd: Configuring disk cache size on postgress