Re: Comparitive UPDATE speed

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Comparitive UPDATE speed
Date: 2002-10-04 15:54:56
Message-ID: web-1771475@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Folks,

Sorry for the double-quoting here. I sent this to just Ron by
accident. My original question is double-quoted, Ron is quoted, and my
responses are below. Thanks!

> > Ok, I'm still confused.
> >
> > I'm updating a (not not indexed) field in a 117,000 row table based
> on
> > information in another 117,000 row table. The update is an
> integer, and the
> > linking fields are indexed. Yet the two queries are flattening my
>
> > dual-processor, RAID5 database server for up to 11 minutes ...
> using 230mb
> > ram the entire time. I simply can't believe that these two
> queries are that
> > difficult.
>
> So there's no index on elbs_matter_links.case_id? From your original
>
> post, I thought that there *is* an index on that field.

I'm now dropping it before the update. Unfortunately, dropping the
index made no appreciable gain in performance.

> > I've increased the memory available to the update to 256mb, and
> tried forcing
> > an index scan ... to no avail. Ideas, please?
> >
> > The queries:
> >
> > UPDATE elbs_matter_links SET case_id = case_clients.case_id
> > FROM case_clients
> > WHERE elbs_matter_links.mmatter = case_clients.matter_no;
>
> What happens if you run the query:
> SELECT eml.case_id, cc.case_id, eml.mmatter, cc.matter_no
> FROM elbs_matter_links eml,
> case_clients cc
> WHERE eml.mmatter = cc.matter_no;
>
> That, for all intents and purposes, is your UPDATE statement, just
> without doing the UPDATE. How fast does it run?

Slowly. It takes about 60 seconds to return data. This may be the
problem. Thoughts? Here's EXPLAIN output:

Hash Join (cost=3076.10..91842.88 rows=108648 width=40)
-> Seq Scan on elbs_matter_links eml (cost=0.00..85641.87
rows=117787 width=20)
-> Hash (cost=2804.48..2804.48 rows=108648 width=20)
-> Seq Scan on case_clients cc (cost=0.00..2804.48
rows=108648 width=20)

According to the parser, using the indexes would be worse:

Merge Join (cost=0.00..520624.38 rows=108648 width=40)
-> Index Scan using idx_eml_mmatter on elbs_matter_links eml
(cost=0.00..451735.00 rows=117787 width=20)
-> Index Scan using idx_caseclients_matter on case_clients cc
(cost=0.00..66965.20 rows=108648 width=20)

Though in practice, a forced index scan returns rows in about 60
seconds, same as the SeqScan version.

All of this seems very costly for a query that, while it does return a
lot of rows, is essentially a very simple query.

More importantly, on the hardware I'm using, I would expect better
performance that I get on my laptop ... and I'm not seeing it. I just
can't believe that the simple query above could soak 200mb of RAM for a
full 60 seconds to return a result. It's like queries over a certain
result size on the system choke postgres.

My reference data below:
==============================================

>
> > UPDATE elbs_matter_links SET case_id = cases.case_id
> > FROM cases
> > WHERE elbs_matter_links.docket = cases.docket
> > AND elbs_matter_links.case_id IS NULL;
> >
> >
> > EXPLAIN output:
> >
> > Hash Join (cost=4204.83..39106.77 rows=8473 width=299)
> > -> Index Scan using idx_eml_mmatter on elbs_matter_links
> > (cost=0.00..34668.94 rows=8473 width=279)
> > -> Hash (cost=2808.38..2808.38 rows=109038 width=20)
> > -> Seq Scan on case_clients (cost=0.00..2808.38
> rows=109038
> > width=20)
> >
> > Nested Loop (cost=0.00..32338.47 rows=99 width=300)
> > -> Seq Scan on cases (cost=0.00..9461.97 rows=4297 width=21)
> > -> Index Scan using idx_eml_docket on elbs_matter_links
> (cost=0.00..5.31
> > rows=1 width=279)
> >
> > Table defintions:
> >
> > Table "elbs_matter_links"
> > Column | Type | Modifiers
> > ------------------+-----------------------+-----------------------
> > mmatter | character varying(15) | not null
> > case_id | integer |
> > matter_check | character varying(20) | not null default 'OK'
> > docket | character varying(50) |
> > case_name | character varying(50) |
> > practice | character varying(50) |
> > opp_counsel_name | character varying(50) |
> > opp_counsel_id | integer |
> > act_type | character varying(10) |
> > lead_case_id | integer |
> > lead_case_docket | character varying(50) |
> > disease | character varying(50) |
> > docket_no | character varying(25) |
> > juris_state | character varying(6) |
> > juris_local | character varying(20) |
> > status | smallint | not null default 1
> > client_id | integer |
> > office_loc | character varying(5) |
> > date_filed | date |
> > date_served | date |
> > date_resolved | date |
> > case_status | character varying(5) |
> > settle_amount | numeric(12,2) | default 0
> > narrative | text |
> > comment | character varying(50) |
> > client_no | character varying(10) |
> > juris_id | integer |
> > Indexes: idx_eml_check,
> > idx_eml_docket,
> > idx_eml_mmatter
> > Primary key: elbs_matter_links_pkey
> >
> > Table "case_clients"
> > Column | Type |
> Modifiers
> >
>
------------------+-----------------------+----------------------------------------------------
> > case_client_id | integer | not null default
> > nextval('case_clients_seq'::text)
> > case_id | integer | not null
> > client_id | integer | not null
> > office_loc | character varying(5) |
> > date_filed | date |
> > date_served | date |
> > date_resolved | date |
> > matter_no | character varying(15) | not null
> > case_status | character varying(5) | not null
> > settle_amount | numeric(14,2) | not null default 0
> > matter_narrative | text |
> > comment | character varying(50) |
> > Indexes: idx_case_clients_client,
> > idx_caseclients_case,
> > idx_caseclients_matter,
> > idx_caseclients_resolved,
> > idx_caseclients_served,
> > idx_caseclients_status
> > Primary key: case_clients_pkey
> >
> >
> > Table "cases"
> > Column | Type |
> Modifiers
> >
>
------------------+-----------------------+---------------------------------------------
> > case_id | integer | not null default
> > nextval('cases_seq'::text)
> > docket | character varying(50) | not null
> > case_name | character varying(50) | not null
> > practice | character varying(50) | not null
> > opp_counsel_name | character varying(50) |
> > opp_counsel_id | integer |
> > act_type | character varying(10) |
> > lead_case_id | integer |
> > lead_case_docket | character varying(50) |
> > disease | character varying(50) |
> > docket_no | character varying(25) | not null
> > juris_state | character varying(6) | not null
> > juris_local | character varying(20) |
> > tgroup_id | integer |
> > status | smallint | not null default 1
> > juris_id | integer |
> > Indexes: idx_case_cases_juris,
> > idx_cases_docket,
> > idx_cases_lead,
> > idx_cases_name,
> > idx_cases_status,
> > idx_cases_tgroup,
> > idx_lower_case_name
> >
> >
> >
> > --
> > Josh Berkus
> > josh(at)agliodbs(dot)com
> > Aglio Database Solutions
> > San Francisco
> --
> +------------------------------------------------------------+
> | Ron Johnson, Jr. mailto:ron(dot)l(dot)johnson(at)cox(dot)net |
> | Jefferson, LA USA http://members.cox.net/ron.l.johnson |
> | |
> | "What other evidence do you have that they are terrorists, |
> | other than that they trained in these camps?" |
> | 17-Sep-2002 Katie Couric to an FBI agent regarding the 5 |
> | men arrested near Buffalo NY |
> +------------------------------------------------------------+
>

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message scott.marlowe 2002-10-04 16:05:10 Re: [HACKERS] Large databases, performance
Previous Message Shridhar Daithankar 2002-10-04 08:00:54 Re: [HACKERS] Large databases, performance