From: | pbj(at)cmicdo(dot)com |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, Igor Neyman <ineyman(at)perceptron(dot)com> |
Subject: | Re: [SPAM] - Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...) |
Date: | 2014-11-03 20:36:46 |
Message-ID: | 1415047006.57773.YahooMailBasic@web161705.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
>
> On Mon, 11/3/14, Igor Neyman <ineyman(at)perceptron(dot)com> wrote:
>
> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org]
> On Behalf Of pbj(at)cmicdo(dot)com
> Sent: Monday, November 03, 2014 11:34 AM
> To: pgsql-general(at)postgresql(dot)org
> Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM
> vs UPDATE SET = (SELECT ...)
>
> Why does the UPDATE SET = FROM choose a more poorly
> performing plan than the UPDATE SET = (SELECT ...)? It
> seems to me that it is the same join.
>
> I'm using 9.3.5.
>
> CREATE TABLE orig
> (
> key1 VARCHAR(11)
> PRIMARY KEY,
> time1 TIME
> );
>
> INSERT INTO orig (key1, time1)
> SELECT
> a::TEXT,
> (((random()*100)::INT %
> 24)::TEXT || ':' ||
> ((random()*100)::INT %
> 60)::TEXT)::TIME FROM generate_series(80000000000,
> 80002000000) a;
>
> CREATE INDEX odx ON orig(key1);
>
> CREATE TABLE second (LIKE orig);
>
> INSERT INTO second (key1)
> SELECT
> (80000000000+(((random()*1000000)::INT) % 1000000))::TEXT
> FROM generate_series(1,400000);
>
> EXPLAIN ANALYZE
> UPDATE second SET time1 = orig.time1
> FROM orig
> WHERE second.key1 = orig.key1;
>
[.....]
>
> UPDATE second SET time1 = NULL;
>
> EXPLAIN ANALYZE
> UPDATE second SET time1 = (SELECT orig.time1 FROM
> orig,second
>
> WHERE orig.key1 = second.key1
> LIMIT 1);
>
[.....]
>
> These 2 queries are not the same.
>
> The first query updates rows in the "second" table with the
> orig.time1 values based on key1 column match.
> The second query finds first possible match (based on key1
> column) and assigns orig.time1 value from the matched row to
> every record in "second" table.
>
> Regards,
> Igor Neyman
I see that now. I was trying to reproduce something from work from
memory and got tripped up on a sublety of UPDATE ... SELECT. The query
I ran at work was like this:
EXPLAIN ANALYZE
UPDATE second se SET time1 = (SELECT time1 FROM orig
WHERE orig.key1 = se.key1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Update on second se (cost=0.00..3390627.00 rows=400000 width=18) (actual time=18698.795..18698.795 rows=0 loops=1)
-> Seq Scan on second se (cost=0.00..3390627.00 rows=400000 width=18) (actual time=7.558..16694.600 rows=400000 loops=1)
SubPlan 1
-> Index Scan using odx on orig (cost=0.43..8.45 rows=1 width=8) (actual time=0.033..0.035 rows=1 loops=400000)
Index Cond: ((key1)::text = (se.key1)::text)
Total runtime: 18698.865 ms
(6 rows)
This does correctly match and update all of the second table entries.
The plan actually runs longer than the UPDATE ... FROM, which squares
with a comment the fine manual.
Thanks!
PJ
From | Date | Subject | |
---|---|---|---|
Next Message | Arthur Silva | 2014-11-04 01:38:57 | Dynomite from Netflix - Making Non-Distributed Databases, Distributed |
Previous Message | Tom Lane | 2014-11-03 20:16:47 | Re: STABLE vs. IMMUTABLE w.r.t. indexes |