Quick Links

Re: Query optimization question

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	"Daniel S(dot) Myers" <dmyers(at)pomona(dot)edu>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Query optimization question
Date:	2002-11-19 01:59:20
Message-ID:	3DD99AF8.7000704@joeconway.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Daniel S. Myers wrote:
> Hi,
> I’m working on a large-scale simulation study in biology, and
> I’m using PostgreSQL as the hub of a distributed computing system.
> Essentially, I have a single table containing 2.7 million rows, each of
> which represents a work unit to be processed (think SETI(at)Home). Among
> other fields, each row in the table contains a unique ID and a status,
> which is one of WAIT (not yet processed) OUT (processing) or DONE
> (completed). To dispatch a unit to a client, I’d like to pick a row with
> status = WAIT, mark it as OUT, and return the values. The problem I have
> is that finding a row takes a really long time (~22s on a 2-way PIII-700
> running Linux 2.4.19). My SQL looks like: select min(id) from analyses
> where status=’WAIT’. I have indexes on the id field and the status
> field, but an explain shows that they’re not being used—I’m assuming
> that the optimizer has decided that there’s no point in using the index,
> since it doesn’t sufficiently limit the scope of the search. Is there
> some optimization that I’m missing (I’ve tried indexes on status/id and
> id/status as well as the VACUUM and CLUSTER), or will I have to do
> something more than the naïve approach? (We’re actually sampling 2700
> points 1000 times, so I can use replicates_executed counters in each row
> and have a separate results table if I have to; it’d just be nice to
> keep things simple if possible).
>

You might try creating a partial index (see the CREATE INDEX page in the
documentation -- basically an index built on just part of a table qualified by
a where clause) on id where status='WAIT'. I've never actually used one
myself, but it seems like it might help in your situation. Also, instead of
min(id), try:

select id from analyses where status=’WAIT’ order by id limit 1;

For reasons discussed before on the list (please search the archives), min()
won't use your index, but ORDER BY ... LIMIT can.

HTH,

Joe

In response to

Query optimization question at 2002-11-19 01:49:28 from Daniel S. Myers

Browse pgsql-general by date

	From	Date	Subject
Next Message	Brian Minton	2002-11-19 03:36:24	get_bit etc.
Previous Message	Daniel S. Myers	2002-11-19 01:49:28	Query optimization question