Re: DELETE with filter on ctid

From: "Spiegelberg, Greg" <gspiegelberg(at)cranel(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: DELETE with filter on ctid
Date: 2007-04-10 13:35:14
Message-ID: 82E74D266CB9B44390D3CCE44A781ED90B6ADD@POSTOFFICE.cranel.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom et al,

Sometimes it takes a look from someone on the outside to get the job
done right.

Below is, I believe, everything pertinent to this problem. First is the
table in question, second is the problematic and original query, and
final is the transaction that I have working today with the CTID
implementation.

I would welcome any feedback.

TIA,
Greg

cranel=# \d sid2.data_id_table
Table "sid2.data_id_table"
Column | Type | Modifiers
-------------+---------+---------------
point_id | bigint |
dtype_id | bigint |
segment_id | bigint |
key1_id | bigint | not null
key2_id | bigint |
data_id | bigint | not null
deleted | boolean | default false
removed | boolean | default false
added | boolean | default false
persist | boolean | default false
Indexes:
"data_id_table_data_id_indx" btree (data_id)
"data_id_table_dtype_id_indx" btree (dtype_id)
"data_id_table_dtype_ss_id_indx" btree (dtype_id, point_id)
"data_id_table_key1_id_indx" btree (key1_id)
"data_id_table_key2_id_indx" btree (key2_id)
"data_id_table_mod_dtype_ss_id_indx" btree (segment_id, dtype_id,
point_id)
"data_id_table_segment_id_indx" btree (segment_id)
"data_id_table_point_id_indx" btree (point_id)

cranel=# explain analyze
DELETE FROM sid2.data_id_table AS dd
USING public.points AS ss,
(SELECT markeddel.*
FROM (SELECT d.*
FROM sid2.data_id_table d,public.points s
WHERE s.systems_id=2 AND s.id<2 AND s.permpoint=FALSE
AND s.id=d.point_id AND d.persist=FALSE
AND d.dtype_id=3) AS markeddel
JOIN
(SELECT DISTINCT ON (d.key1_id,d.key2_id) d.*
FROM sid2.data_id_table d,public.points s
WHERE s.systems_id=2 AND s.id<=2 AND s.id=d.point_id
AND d.dtype_id=3
ORDER BY d.key1_id,d.key2_id,d.point_id DESC) AS rollup
ON
(markeddel.key1_id,markeddel.key2_id)=(rollup.key1_id,rollup.key2_id)
WHERE markeddel.point_id<>rollup.point_id) ru
WHERE ss.systems_id=2 AND ss.id<2 AND ss.permpoint=FALSE AND
ss.id=dd.point_id
AND dd.persist=FALSE AND dd.dtype_id=3
AND
(dd.point_id,dd.key1_id,dd.key2_id)=(ru.point_id,ru.key1_id,ru.key2_id);

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------
Nested Loop (cost=1037.06..1130.46 rows=1 width=6) (actual
time=33291.639..678047.543 rows=564 loops=1)
Join Filter: ((dd.point_id = d.point_id) AND (d.point_id <>
rollup.point_id))
-> Merge Join (cost=1028.10..1117.47 rows=1 width=70) (actual
time=1775.971..3721.991 rows=156750 loops=1)
Merge Cond: ((rollup.key1_id = dd.key1_id) AND (rollup.key2_id
= dd.key2_id))
-> Unique (cost=629.66..659.24 rows=3944 width=52) (actual
time=896.293..1571.591 rows=156779 loops=1)
-> Sort (cost=629.66..639.52 rows=3944 width=52)
(actual time=896.285..1080.444 rows=157342 loops=1)
Sort Key: d.key1_id, d.key2_id, d.point_id
-> Nested Loop (cost=0.00..394.10 rows=3944
width=52) (actual time=8.846..529.901 rows=157352 loops=1)
-> Seq Scan on points s (cost=0.00..1.72
rows=1 width=8) (actual time=0.064..0.096 rows=2 loops=1)
Filter: ((systems_id = 2) AND (id <=
2))
-> Index Scan using
data_id_table_point_id_indx on data_id_table d (cost=0.00..339.79
rows=4207 width=52) (actual time=4.649..155.174 rows=78676 loops=2)
Index Cond: (s.id = d.point_id)
Filter: (dtype_id = 3)
-> Sort (cost=398.44..398.64 rows=82 width=46) (actual
time=879.658..1109.830 rows=156750 loops=1)
Sort Key: dd.key1_id, dd.key2_id
-> Nested Loop (cost=0.00..395.83 rows=82 width=46)
(actual time=5.197..549.873 rows=156750 loops=1)
-> Nested Loop (cost=0.00..3.45 rows=1 width=16)
(actual time=0.055..0.107 rows=1 loops=1)
Join Filter: (ss.id = s.id)
-> Seq Scan on points ss (cost=0.00..1.72
rows=1 width=8) (actual time=0.037..0.052 rows=1 loops=1)
Filter: ((systems_id = 2) AND (id < 2)
AND (NOT permpoint))
-> Seq Scan on points s (cost=0.00..1.72
rows=1 width=8) (actual time=0.006..0.039 rows=1 loops=1)
Filter: ((systems_id = 2) AND (id < 2)
AND (NOT permpoint))
-> Index Scan using data_id_table_point_id_indx on
data_id_table dd (cost=0.00..339.79 rows=4207 width=30) (actual
time=5.135..342.406 rows=156750 loops=1)
Index Cond: (ss.id = dd.point_id)
Filter: ((NOT persist) AND (dtype_id = 3))
-> Bitmap Heap Scan on data_id_table d (cost=8.96..12.97 rows=1
width=24) (actual time=4.289..4.290 rows=1 loops=156750)
Recheck Cond: ((d.key1_id = rollup.key1_id) AND (d.key2_id =
rollup.key2_id))
Filter: ((NOT persist) AND (dtype_id = 3))
-> BitmapAnd (cost=8.96..8.96 rows=1 width=0) (actual
time=4.280..4.280 rows=0 loops=156750)
-> Bitmap Index Scan on data_id_table_key1_id_indx
(cost=0.00..4.32 rows=4 width=0) (actual time=0.020..0.020 rows=31
loops=156750)
Index Cond: (d.key1_id = rollup.key1_id)
-> Bitmap Index Scan on data_id_table_key2_id_indx
(cost=0.00..4.38 rows=13 width=0) (actual time=4.254..4.254 rows=26187
loops=156750)
Index Cond: (d.key2_id = rollup.key2_id)
Total runtime: 678063.873 ms
(34 rows)

cranel=# \timing
Timing is on.
cranel=# BEGIN;
BEGIN
Time: 0.340 ms

cranel=# CREATE INDEX data_id_table_ctid_idx ON
sid2.data_id_table(ctid);
CREATE INDEX
Time: 648.911 ms

cranel=# explain analyze
DELETE FROM sid2.data_id_table AS dd
USING public.points AS ss,
(SELECT markeddel.ctid
FROM (SELECT d.ctid,d.*
FROM sid2.data_id_table d,public.points s
WHERE s.systems_id=2
AND s.id<2
AND s.permpoint=FALSE
AND s.id=d.point_id
AND d.persist=FALSE
AND d.dtype_id=3) AS markeddel
JOIN
(SELECT DISTINCT ON (d.key1_id,d.key2_id) d.*
FROM sid2.data_id_table d,public.points s
WHERE s.systems_id=2
AND s.id<=2
AND s.id=d.point_id
AND d.dtype_id=3
ORDER BY d.key1_id,d.key2_id,d.point_id DESC) AS rollup
ON
(markeddel.key1_id,markeddel.key2_id)=(rollup.key1_id,rollup.key2_id)
WHERE markeddel.point_id<>rollup.point_id) ru
WHERE ss.systems_id=2
AND ss.id<2
AND ss.permpoint=FALSE
AND ss.id=dd.point_id
AND dd.persist=FALSE
AND dd.dtype_id=3
AND dd.ctid=ru.ctid;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------
Nested Loop (cost=1259.33..1378.37 rows=1 width=6) (actual
time=1807.429..2625.722 rows=562 loops=1)
-> Nested Loop (cost=1259.33..1378.08 rows=1 width=14) (actual
time=1807.372..2619.592 rows=562 loops=1)
-> Merge Join (cost=1259.33..1377.66 rows=1 width=6) (actual
time=1807.228..2606.901 rows=562 loops=1)
Merge Cond: ((rollup.key1_id = d.key1_id) AND
(rollup.key2_id = d.key2_id))
Join Filter: (d.point_id <> rollup.point_id)
-> Unique (cost=629.66..659.24 rows=3944 width=52)
(actual time=911.409..1271.121 rows=156779 loops=1)
-> Sort (cost=629.66..639.52 rows=3944 width=52)
(actual time=911.403..1024.775 rows=157342 loops=1)
Sort Key: d.key1_id, d.key2_id, d.point_id
-> Nested Loop (cost=0.00..394.10 rows=3944
width=52) (actual time=6.036..548.119 rows=157352 loops=1)
-> Seq Scan on points s
(cost=0.00..1.72 rows=1 width=8) (actual time=0.114..0.137 rows=2
loops=1)
Filter: ((systems_id = 2) AND (id
<= 2))
-> Index Scan using
data_id_table_point_id_indx on data_id_table d (cost=0.00..339.79
rows=4207 width=52) (actual time=3.216..155.284
rows=78676 loops=2)
Index Cond: (s.id = d.point_id)
Filter: (dtype_id = 3)
-> Sort (cost=629.66..639.52 rows=3944 width=30)
(actual time=875.213..980.618 rows=156750 loops=1)
Sort Key: d.key1_id, d.key2_id
-> Nested Loop (cost=0.00..394.10 rows=3944
width=30) (actual time=5.864..553.290 rows=156750 loops=1)
-> Seq Scan on points s (cost=0.00..1.72
rows=1 width=8) (actual time=0.022..0.053 rows=1 loops=1)
Filter: ((systems_id = 2) AND (id < 2)
AND (NOT permpoint))
-> Index Scan using
data_id_table_point_id_indx on data_id_table d (cost=0.00..339.79
rows=4207 width=30) (actual time=5.831..355.139 rows=156750 loops=1)
Index Cond: (s.id = d.point_id)
Filter: ((NOT persist) AND (dtype_id =
3))
-> Index Scan using data_id_table_ctid_idx on data_id_table dd
(cost=0.00..0.41 rows=1 width=14) (actual time=0.017..0.019 rows=1
loops=562)
Index Cond: (dd.ctid = d.ctid)
Filter: ((NOT persist) AND (dtype_id = 3))
-> Index Scan using points_pkey on points ss (cost=0.00..0.28
rows=1 width=8) (actual time=0.005..0.007 rows=1 loops=562)
Index Cond: ((ss.id < 2) AND (ss.id = dd.point_id))
Filter: ((systems_id = 2) AND (NOT permpoint))
Total runtime: 2641.820 ms
(29 rows)
Time: 2652.940 ms

cranel=# DROP INDEX data_id_table_ctid_idx;
DROP INDEX
Time: 33.653 ms

cranel=# DELETE FROM sid2.data_id_table AS dd WHERE dd.point_id=2 AND
dd.dtype_id=3 AND dd.deleted AND NOT dd.persist;
DELETE 0
Time: 0.960 ms

cranel=# COMMIT;
Time: 20.500 ms

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Monday, April 09, 2007 4:55 PM
To: Spiegelberg, Greg
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] DELETE with filter on ctid

"Spiegelberg, Greg" <gspiegelberg(at)cranel(dot)com> writes:
> We have a query which generates a small set of rows (~1,000) which are
> to be used in a DELETE on the same table. The problem we have is that
> we need to join on 5 different columns and it takes far too long. I
> have a solution but I'm not sure it's the right one. Instead of
joining
> on 5 columns in the DELETE the join uses the ctid column.

> BEGIN;
> CREATE INDEX gregs_table_ctid_idx ON gregs_table(ctid);
> DELETE FROM gregs_table gt
> USING (SELECT ctid FROM gregs_table WHERE ...) as s
> WHERE gt.ctid=s.ctid;
> DROP INDEX gregs_table_ctid_idx;
> COMMIT;

Forget the index, it's useless here (hint: ctid is a physical address).
I'm wondering though why you don't just transpose the subquery's WHERE
condition into the DELETE's WHERE? Or is this example oversimplified?

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Spiegelberg, Greg 2007-04-10 13:37:34 Re: DELETE with filter on ctid
Previous Message Mike Gargano 2007-04-10 13:15:33 Re: Beginner Question