From: | "Rosser Schwarz" <rschwarz(at)totalcardinc(dot)com> |
---|---|
To: | "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: atrocious update performance |
Date: | 2004-03-16 19:58:47 |
Message-ID: | 001e01c40b91$181d66c0$2500fa0a@CardServices.TCI.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
while you weren't looking, Tom Lane wrote:
> ...slow FK checks could be your problem if the application is set
> up to issue multiple UPDATEs affecting the same row(s) during a
> single transaction. I'm not clear on whether that applies to you
> or not.
It shouldn't. It's just one large batch update that should be hitting
every row serially.
> And anyway the bottom line is: have you got indexes on the columns
> *referencing* account.cust.custid?
No. I'd've sworn I had one on account.acct.custid, since that table
is popupated (currently ~500K rows), but it's not.
$ time psql tci -c "explain analyze select * from account.acct where
custid = 257458"
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on acct (cost=0.00..7166.68 rows=2 width=71) (actual
time=1047.122..1047.122 rows=0 loops=1)
Filter: (custid = 257458)
Total runtime: 1047.362 ms
(3 rows)
real 0m1.083s
user 0m0.010s
sys 0m0.000s
If it is looking up the custid in account.acct for each row, that's,
say, 1 seconds per lookup, for 4.7 million lookups, for, if my math
is right (4,731,410 / 3600 / 24) 54 days. I suppose that tracks, but
that doesn't make sense, given what you said about the fk checks,
above.
Of course, if I index the column and amend the query to say "where
custid = 194752::bigint" I get back much saner numbers:
QUERY PLAN
----------------------------------------------------------------------
Index Scan using ix_fk_acct_custid on acct (cost=0.00..3.34 rows=2
width=71) (actual time=0.126..0.141 rows=2 loops=1)
Index Cond: (custid = 194752::bigint)
Total runtime: 0.314 ms
(3 rows)
real 0m0.036s
user 0m0.010s
sys 0m0.000s
Which would still take just under two days.
$ time psql tci -c "explain analyze update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid"
But if I'm not touching the column referenced from account.acct, why
would it be looking there at all? I've got an explain analyze of the
update running now, but until it finishes, I can't say for certain
what it's doing. explain, alone, says:
$ time psql tci -c "explain update account.cust set prodid =
tempprod.prodid, subprodid = tempprod.subprodid where origid =
tempprod.debtid;"
QUERY PLAN
---------------------------------------------------------------------
Merge Join (cost=0.00..232764.69 rows=4731410 width=252)
Merge Cond: (("outer".origid)::text = ("inner".debtid)::text)
-> Index Scan using ix_origid on cust (cost=0.00..94876.83
rows=4731410 width=236)
-> Index Scan using ix_debtid on tempprod (cost=0.00..66916.71
rows=4731410 width=26)
(4 rows)
real 0m26.965s
user 0m0.010s
sys 0m0.000s
which shows it not hitting account.acct at all. (And why did it take
the planner 20-some seconds to come up with that query plan?)
tempprod doesn't have an index either, but then it doesn't reference
account.cust; instead, the update would be done by joining the two on
debtid/origid, which map one-to-one, are both indexed, and with both
tables clustered on those indices--exactly as was the CREATE TABLE AS
Aaron suggested elsethread.
Unfortunately, this isn't the only large update we'll have to do. We
receive a daily, ~100K rows file that may have new values for any field
of any row in account.cust, .acct or sundry other tables. The process
of updating from that file is time-critical; it must run in minutes, at
the outside.
/rls
--
Rosser Schwarz
Total Card, Inc.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-03-16 20:14:46 | Re: atrocious update performance |
Previous Message | Darcy Buskermolen | 2004-03-16 18:54:35 | Fwd: Configuring disk cache size on postgress |