From: | Dirk Mika <Dirk(dot)Mika(at)mikatiming(dot)de> |
---|---|
To: | "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Use ctid in where clause in update from statement |
Date: | 2019-07-01 09:13:19 |
Message-ID: | C8D480EC-AFF2-40CC-8BD3-754A1204011A@mikatiming.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:
MERGE INTO test_large d
USING (SELECT ROWID, test_large.*
FROM test_large
WHERE grp = 1) s
ON (d.ROWID = s.ROWID)
WHEN MATCHED
THEN
UPDATE SET d.grp = s.grp;
It has the following execution plan:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1000 | 16000 | 2006 (0)| 00:00:25 |
| 1 | MERGE | TEST_LARGE | | | | |
| 2 | VIEW | | | | | |
| 3 | NESTED LOOPS | | 1000 | 220K| 2006 (0)| 00:00:25 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST_LARGE | 1000 | 110K| 1006 (0)| 00:00:13 |
|* 5 | INDEX RANGE SCAN | IX_TL_GRP | 1000 | | 6 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID | TEST_LARGE | 1 | 113 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
In the PostgreSQL version I use the column ctid for this. The above query becomes:
UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;
I noticed that the execution plan for such a statement seems to be suboptimal:
Update on test_large d (cost=155346.98..160367.08 rows=996 width=53)
-> Merge Join (cost=155346.98..160367.08 rows=996 width=53)
Merge Cond: (d.ctid = test_large.ctid)
-> Sort (cost=154330.57..156833.16 rows=1001033 width=43)
Sort Key: d.ctid
-> Seq Scan on test_large d (cost=0.00..19366.33 rows=1001033 width=43)
-> Sort (cost=1016.40..1018.89 rows=996 width=10)
Sort Key: test_large.ctid
-> Index Scan using ix_tl_grp on test_large (cost=0.42..966.80 rows=996 width=10)
Index Cond: (grp = 1)
I expected a tid scan to be used instead of a Seq scan. I did a VACUUM test_large to make sure statistics are valid.
I would suspect that a tid scan is even a bit faster than accessing via the primary key, since reading the index is not required.
I know I could change the WHERE clause to use the primary key, which results in the following plan:
Update on test_large d (cost=0.85..3344.12 rows=995 width=57)
-> Nested Loop (cost=0.85..3344.12 rows=995 width=57)
-> Index Scan using ix_tl_grp on test_large (cost=0.42..994.84 rows=995 width=14)
Index Cond: (grp = 1)
-> Index Scan using pk_test_large on test_large d (cost=0.42..2.36 rows=1 width=47)
Index Cond: (id = test_large.id)
So my question is more "Is a tid scan not possible / useful for an update" than "how do I accelerate this query".
BR
Dirk
--
Dirk Mika
Software Developer
[cid:image001_d0a5cf51-5dd3-4ff5-b428-1016a6d95a0f.png]
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk(dot)mika(at)mikatiming(dot)de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
[cid:CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpg]<https://youtu.be/qfOFXrpSKLQ>
From | Date | Subject | |
---|---|---|---|
Next Message | Achilleas Mantzios | 2019-07-01 09:27:16 | Re: Use ctid in where clause in update from statement |
Previous Message | Achilleas Mantzios | 2019-07-01 07:12:51 | Re: iterate over partitions |