Delete performance

From: Jana <jana(dot)vasseru(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Delete performance
Date: 2009-06-14 13:36:34
Message-ID: op.uvin28aig6o41l@truhlik
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice


Hello,
i have a table with about 250m records from which i want to delete thoose
not contained in other table. I used this SQL query:
DELETE FROM data_structures_items WHERE id_data_structure NOT IN (
SELECT id_structure FROM data_structures);

This ran for 24hours and in the mean time it read more than 20TB of disk
(then i run out of patience). Since the actual table is only about 16GB in
size (the whole database is ~50GB) the only explanation i can think of is
that the result of the sub-query is not cached and is read again for each
of 2.43655e+008 rows of data_structures_items table. Also i don't quite
get why is data_individual_structures_pkey having ~52MB when the actual
data contained is only ~343kb. Autovacuum is on and before doing this
query i also ran vacum full analyze. Please tell me there is some logic in
this and i am doing something wrong.

SELECT relname, reltuples, relpages FROM pg_class ;
relname reltuples relpages
data_structures 85820 2002
data_structures_id_individual_state 85820 6526
data_structures_id_structure_seq 1 1
data_structures_items 2.43655e+008 2030460
data_structures_items_depth 2.43655e+008 675971
data_structures_items_id_data_structure 2.43655e+008 668184
data_structures_items_id_data_structure_item_seq 1 1
data_structures_items_left 2.43655e+008 676334
data_structures_items_pkey 2.43655e+008 668074
data_individual_structures_pkey 85820 6526

CREATE TABLE data_structures_items
(
id_data_structure_item serial NOT NULL,
id_data_structure integer NOT NULL,
"text" character varying(255) NOT NULL,
lft integer NOT NULL,
rght integer NOT NULL,
depth integer NOT NULL,
description character varying(255),
CONSTRAINT data_structures_items_pkey PRIMARY KEY
(id_data_structure_item)
)
WITH (OIDS=TRUE);

CREATE INDEX data_structures_items_depth
ON data_structures_items
USING btree
(depth);

CREATE INDEX data_structures_items_id_data_structure
ON data_structures_items
USING btree
(id_data_structure);

CREATE INDEX data_structures_items_left
ON data_structures_items
USING btree
(lft);

CREATE TABLE data_structures
(
id_structure serial NOT NULL,
id_individual_state integer NOT NULL,
"text" text,
"timestamp" timestamp(0) without time zone DEFAULT now(),
CONSTRAINT data_individual_structures_pkey PRIMARY KEY (id_structure),
CONSTRAINT data_structures_fk FOREIGN KEY (id_individual_state)
REFERENCES data_individual_states (id_individual_state) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (OIDS=TRUE);
ALTER TABLE data_structures OWNER TO sanae;

CREATE INDEX data_structures_id_individual_state
ON data_structures
USING btree
(id_individual_state);

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Frank Bax 2009-06-14 16:12:50 Re: Delete performance
Previous Message Joshua Tolley 2009-06-13 20:37:33 Re: Views