From: | Jeff Boes <jboes(at)nexcerpt(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Large number of lookups |
Date: | 2002-08-30 19:40:29 |
Message-ID: | akohm7$2rv0$1@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
I have a large table (>2 Mrows) against which my application runs some 1M
queries per day. The queries are almost all of the nature of
select PRIMARY_KEY_FLD from MY_TABLE
where SECONDARY_KEY_FLD = 'something';
I've optimized the table to the limits of what I can think of, now I'm
looking to optimize the application. I'm really only interested in the
yes/no question of whether the row exists (the returned PRIMARY_KEY_FLD
value is pretty much just for debug documentation). What I'm wondering
is whether my application would get better (faster) results if I ran a
number of queries at once.
For example, I could build SQL that looks like:
select PRIMARY_KEY_FLD, SECONDARY_KEY_FLD
from MY_TABLE WHERE SECONDARY_KEY_FLD in
(first_val, second_val, ...);
so that the list contains some number of values (either a limited "slice"
of the values I care about, or the whole pile--usually 100-200), and
iterate this until I've checked them all.
Another technique would be to construct a "UNION" table:
select PRIMARY_KEY_FLD, SECONDARY_KEY_FLD
from MY_TABLE
join (
select first_val as "SECONDARY_KEY_FLD"
union select second_val
union ...) as TEMP_TABLE
using (SECONDARY_KEY_FLD)
likewise, either taking some number of my desired values in "chunks" or
all at once.
What's likely to work better, and why? (I'm going off to write a
benchmark script, but I'd like to hear some theoretical answers, too.)
--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise
From | Date | Subject | |
---|---|---|---|
Next Message | Jean-Luc Lachance | 2002-08-30 20:00:58 | Re: query problem |
Previous Message | Marco Muratori | 2002-08-30 16:38:28 | query problem |