From: | Jochen Erwied <jochen(at)erwied(dot)eu> |
---|---|
To: | "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> |
Cc: | antoine(at)inaps(dot)org, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Duplicate deletion optimizations |
Date: | 2012-01-07 14:18:37 |
Message-ID: | 1723406361.20120107151837@erwied.eu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Saturday, January 7, 2012, 1:21:02 PM you wrote:
> where t_imp.id is null and test.id=t_imp.id;
> =>
> where t_imp.id is not null and test.id=t_imp.id;
You're right, overlooked that one. But the increase to execute the query is
- maybe not completely - suprisingly minimal.
Because the query updating the id-column of t_imp fetches all rows from
test to be updated, they are already cached, and the second query is run
completely from cache. I suppose you will get a severe performance hit when
the table cannot be cached...
I ran the loop again, after 30 minutes I'm at about 3-5 seconds per loop,
as long as the server isn't doing something else. Under load it's at about
10-20 seconds, with a ratio of 40% updates, 60% inserts.
> and a partial index on matching rows might help (should be tested):
> (after the first updat)
> create index t_imp_ix on t_imp(t_value,t_record,output_id) where t_imp.id is not null.
I don't think this will help much since t_imp is scanned sequentially
anyway, so creating an index is just unneeded overhead.
--
Jochen Erwied | home: jochen(at)erwied(dot)eu +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 | work: joe(at)mbs-software(dot)de +49-2151-7294-24, FAX: -50
D-45470 Muelheim | mobile: jochen(dot)erwied(at)vodafone(dot)de +49-173-5404164
From | Date | Subject | |
---|---|---|---|
Next Message | Pierre C | 2012-01-07 15:31:23 | Re: Duplicate deletion optimizations |
Previous Message | Jochen Erwied | 2012-01-07 14:17:58 | Re: Duplicate deletion optimizations |