Quick Links

Re: [SPAM] - Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

From:	pbj(at)cmicdo(dot)com
To:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, Igor Neyman <ineyman(at)perceptron(dot)com>
Subject:	Re: [SPAM] - Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
Date:	2014-11-03 20:36:46
Message-ID:	1415047006.57773.YahooMailBasic@web161705.mail.bf1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

>
> On Mon, 11/3/14, Igor Neyman <ineyman(at)perceptron(dot)com> wrote:
>
> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org]
> On Behalf Of pbj(at)cmicdo(dot)com
> Sent: Monday, November 03, 2014 11:34 AM
> To: pgsql-general(at)postgresql(dot)org
> Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM
> vs UPDATE SET = (SELECT ...)
>
> Why does the UPDATE SET = FROM choose a more poorly
> performing plan than the UPDATE SET = (SELECT ...)? It
> seems to me that it is the same join.
>
> I'm using 9.3.5.
>
> CREATE TABLE orig
> (
> key1 VARCHAR(11)
> PRIMARY KEY,
> time1 TIME
> );
>
> INSERT INTO orig (key1, time1)
> SELECT
> a::TEXT,
> (((random()*100)::INT %
> 24)::TEXT || ':' ||
> ((random()*100)::INT %
> 60)::TEXT)::TIME FROM generate_series(80000000000,
> 80002000000) a;
>
> CREATE INDEX odx ON orig(key1);
>
> CREATE TABLE second (LIKE orig);
>
> INSERT INTO second (key1)
> SELECT
> (80000000000+(((random()*1000000)::INT) % 1000000))::TEXT
> FROM generate_series(1,400000);
>
> EXPLAIN ANALYZE
> UPDATE second SET time1 = orig.time1
> FROM orig
> WHERE second.key1 = orig.key1;
>
[.....]
>
> UPDATE second SET time1 = NULL;
>
> EXPLAIN ANALYZE
> UPDATE second SET time1 = (SELECT orig.time1 FROM
> orig,second
>
> WHERE orig.key1 = second.key1
> LIMIT 1);
>
[.....]
>
> These 2 queries are not the same.

>
> The first query updates rows in the "second" table with the
> orig.time1 values based on key1 column match.
> The second query finds first possible match (based on key1
> column) and assigns orig.time1 value from the matched row to
> every record in "second" table.
>
> Regards,
> Igor Neyman

I see that now. I was trying to reproduce something from work from
memory and got tripped up on a sublety of UPDATE ... SELECT. The query
I ran at work was like this:

EXPLAIN ANALYZE
UPDATE second se SET time1 = (SELECT time1 FROM orig
WHERE orig.key1 = se.key1);

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Update on second se (cost=0.00..3390627.00 rows=400000 width=18) (actual time=18698.795..18698.795 rows=0 loops=1)
-> Seq Scan on second se (cost=0.00..3390627.00 rows=400000 width=18) (actual time=7.558..16694.600 rows=400000 loops=1)
SubPlan 1
-> Index Scan using odx on orig (cost=0.43..8.45 rows=1 width=8) (actual time=0.033..0.035 rows=1 loops=400000)
Index Cond: ((key1)::text = (se.key1)::text)
Total runtime: 18698.865 ms
(6 rows)

This does correctly match and update all of the second table entries.
The plan actually runs longer than the UPDATE ... FROM, which squares
with a comment the fine manual.

Thanks!
PJ

In response to

Re: [SPAM] - Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...) at 2014-11-03 19:01:29 from Igor Neyman

Browse pgsql-general by date

	From	Date	Subject
Next Message	Arthur Silva	2014-11-04 01:38:57	Dynomite from Netflix - Making Non-Distributed Databases, Distributed
Previous Message	Tom Lane	2014-11-03 20:16:47	Re: STABLE vs. IMMUTABLE w.r.t. indexes