From: | bob_bamber(at)hotmail(dot)com (Bob) |
---|---|
To: | pgsql-novice(at)postgresql(dot)org |
Subject: | Slow subquery on large dataset |
Date: | 2004-02-19 22:07:10 |
Message-ID: | b598ae2.0402191407.c1548ea@posting.google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Hi,
I'm having some performance issues when querying a couple of tables
containing a large amount of data.
Here's the schema:
CREATE TABLE capacity_data (
data_id SERIAL,
data TEXT,
modified TIMESTAMPTZ DEFAULT NOW(),
modified_by INTEGER NOT NULL,
CONSTRAINT capacity_data_pk PRIMARY KEY (data_id),
CONSTRAINT capacity_data_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
);
CREATE TABLE capacities (
CREATE TABLE capacities (
room_id BIGINT NOT NULL,
capacity_type_id BIGINT NOT NULL,
data_id BIGINT NOT NULL,
modified TIMESTAMPTZ DEFAULT NOW(),
modified_by INTEGER NOT NULL,
CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id),
CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES
meeting_rooms(room_id) ON DELETE CASCADE,
CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY
(capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON
DELETE CASCADE,
CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES
capacity_data(data_id) ON DELETE CASCADE,
CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
); data_id BIGINT NOT NULL,
modified TIMESTAMPTZ DEFAULT NOW(),
modified_by INTEGER NOT NULL,
CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id),
CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES
meeting_rooms(room_id) ON DELETE CASCADE,
CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY
(capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON
DELETE CASCADE,
CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES
capacity_data(data_id) ON DELETE CASCADE,
CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
);
I'm using a subquery to find all the capacity_data.data_id's that are
not in capacities:
foo=# SELECT data_id FROM capacity_data WHERE data_id NOT IN (SELECT
data_id FROM capacities);
However, I have over 15,000 records in capacity_data. Here is the
query plan:
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on capacity_data (cost=0.00..2086295.56 rows=7538 width=4)
Filter: (subplan)
SubPlan
-> Seq Scan on capacities (cost=0.00..276.75 rows=15075
width=8)
(4 rows)
A little on the slow side! I have indexes on data_id in both tables
(in capacity_data it's the primary key) how can I use them to quickly
acheive what I want?
Thanks in advance,
Bob.
From | Date | Subject | |
---|---|---|---|
Next Message | kynn | 2004-02-20 00:23:29 | psql hangs after "drop table ..." |
Previous Message | Joe Conway | 2004-02-19 18:58:24 | Re: crosstabs |