From: | Joe Conway <mail(at)joeconway(dot)com> |
---|---|
To: | "Daniel S(dot) Myers" <dmyers(at)pomona(dot)edu> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Query optimization question |
Date: | 2002-11-19 01:59:20 |
Message-ID: | 3DD99AF8.7000704@joeconway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Daniel S. Myers wrote:
> Hi,
> I’m working on a large-scale simulation study in biology, and
> I’m using PostgreSQL as the hub of a distributed computing system.
> Essentially, I have a single table containing 2.7 million rows, each of
> which represents a work unit to be processed (think SETI(at)Home). Among
> other fields, each row in the table contains a unique ID and a status,
> which is one of WAIT (not yet processed) OUT (processing) or DONE
> (completed). To dispatch a unit to a client, I’d like to pick a row with
> status = WAIT, mark it as OUT, and return the values. The problem I have
> is that finding a row takes a really long time (~22s on a 2-way PIII-700
> running Linux 2.4.19). My SQL looks like: select min(id) from analyses
> where status=’WAIT’. I have indexes on the id field and the status
> field, but an explain shows that they’re not being used—I’m assuming
> that the optimizer has decided that there’s no point in using the index,
> since it doesn’t sufficiently limit the scope of the search. Is there
> some optimization that I’m missing (I’ve tried indexes on status/id and
> id/status as well as the VACUUM and CLUSTER), or will I have to do
> something more than the naïve approach? (We’re actually sampling 2700
> points 1000 times, so I can use replicates_executed counters in each row
> and have a separate results table if I have to; it’d just be nice to
> keep things simple if possible).
>
You might try creating a partial index (see the CREATE INDEX page in the
documentation -- basically an index built on just part of a table qualified by
a where clause) on id where status='WAIT'. I've never actually used one
myself, but it seems like it might help in your situation. Also, instead of
min(id), try:
select id from analyses where status=’WAIT’ order by id limit 1;
For reasons discussed before on the list (please search the archives), min()
won't use your index, but ORDER BY ... LIMIT can.
HTH,
Joe
From | Date | Subject | |
---|---|---|---|
Next Message | Brian Minton | 2002-11-19 03:36:24 | get_bit etc. |
Previous Message | Daniel S. Myers | 2002-11-19 01:49:28 | Query optimization question |