need much better query perfomance

From: andy(at)mixonic(dot)com (Andy)
To: pgsql-general(at)postgresql(dot)org
Subject: need much better query perfomance
Date: 2003-01-24 23:28:26
Message-ID: 8d60b5ea.0301241528.230778b2@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here's the problem:

Table track has 100,000 rows
Table album has 20,000 rows
Table album_track is a lookup table linking tracks to albums and has
80,000 rows

The track table has a PK, a deleted field ('Y' for deleted tracks /
null otherwise), and several track-specific fields.
The album table also has a PK and a deleted field ('Y' or null), as
well as several album-specific fields.
The album_track table simple has two columns: The PKs of the album and
track tables.

A deleted track may still exist in an not-deleted album. A deleted
track is simply one that cannot be added to new albums.
Likewise, a deleted album may contain not-deleted tracks. It is
simply an album that the user has deleted.

So even though we've marked deleted albums & tracks in the database,
we haven't removed "deleted" tracks from our fileserver. Now we're
almost out of space, so we need to find all tracks that are marked as
deleted and which are not a part of any non-deleted albums.

I wrote what must be a very naive query to find such tracks. Doing an
explain on this query gave a huge time estimate. My query is below.
Would someone please suggest a faster approach?

select t.track_id
from track t
where t.deleted = 'Y'
UNION
select track_id from track where track_id not in (
select at.track_id from album_track at, album alb
where at.album_id = alb.album_id and alb.deleted is null
)

The above query should work, but it takes too long to execute. The
sample query below, however, is fast but incorrect:

select t.track_id
from track t
where t.deleted = 'Y'
UNION
select at.track_id
from album_track at, album alb
where at.album_id = alb.album_id and alb.deleted = 'Y'

This query is incorrect because it the bottom-most select is getting
all deleted albums, but a track might be in both deleted AND
non-deleted albums. This incorrect query will return all the
"completely deleted" tracks, but will also return tracks that are part
of both deleted and non-deleted albums, and we don't want this.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Justin Clift 2003-01-25 00:19:40 Re: Searchable 7.3.1 Documentation - now interactive!
Previous Message Andrew Sullivan 2003-01-24 22:12:06 Re: pg_dump automatic