From: | "Eric Jain" <Eric(dot)Jain(at)isb-sib(dot)ch> |
---|---|
To: | "pgsql-performance" <pgsql-performance(at)postgresql(dot)org> |
Subject: | UPDATE with subquery too slow |
Date: | 2004-02-17 12:38:19 |
Message-ID: | 00ed01c3f552$ec054e80$c300000a@caliente |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I can't get the following statement to complete with reasonable time.
I've had it running for over ten hours without getting anywhere. I
suspect (hope) there may be a better way to accomplish what I'm trying
to do (set fields containing unique values to null):
UPDATE requests
SET session = NULL
WHERE session IN
(
SELECT session
FROM requests
GROUP BY session
HAVING COUNT(*) = 1
);
Output of EXPLAIN:
Nested Loop
(cost=170350.16..305352.37 rows=33533 width=98)
-> HashAggregate
(cost=170350.16..170350.16 rows=200 width=8)
-> Subquery Scan "IN_subquery"
(cost=169728.12..170261.30 rows=35545 width=8)
-> HashAggregate
(cost=169728.12..169905.85 rows=35545 width=8)
Filter: (count(*) = 1)
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075 width=8)
-> Index Scan using requests_session_idx on requests
(cost=0.00..672.92 rows=168 width=106)
Index Cond: (requests."session" = "outer"."session")
If I drop the index on requests(session):
Hash Join
(cost=170350.66..340414.12 rows=33533 width=98)
Hash Cond: ("outer"."session" = "inner"."session")
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075 width=106)
-> Hash
(cost=170350.16..170350.16 rows=200 width=8)
-> HashAggregate
(cost=170350.16..170350.16 rows=200 width=8)
-> Subquery Scan "IN_subquery"
(cost=169728.12..170261.30 rows=35545 width=8)
-> HashAggregate
(cost=169728.12..169905.85 rows=35545 width=8)
Filter: (count(*) = 1)
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075
width=8)
The subquery itself requires 5-10 min to run on its own, and may return
several million rows.
Using EXISTS rather than IN (I'm using 7.4-RC2, not sure if IN queries
were already improved in this release):
UPDATE requests
SET session = NULL
WHERE NOT EXISTS
(
SELECT r.session
FROM requests r
WHERE
r.session = session
AND NOT r.id = id
);
With and without index:
Result
(cost=227855.74..415334.22 rows=8075449 width=101)
One-Time Filter: (NOT $0)
InitPlan
-> Seq Scan on requests r
(cost=0.00..227855.74 rows=201 width=8)
Filter: (("session" = "session") AND (id <> id))
-> Seq Scan on requests
(cost=0.00..187478.49 rows=8075449 width=101)
I've been running this for more than an hour so far, and no end in
sight, either... Any ideas?
From | Date | Subject | |
---|---|---|---|
Next Message | Leeuw van der, Tim | 2004-02-17 12:49:29 | Re: UPDATE with subquery too slow |
Previous Message | Shridhar Daithankar | 2004-02-17 11:00:52 | Re: Slow response of PostgreSQL |