Re: Help with rewriting query

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <junaili(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Help with rewriting query
Date: 2005-06-09 04:01:35
Message-ID: s2a778e0.061@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

This is a pattern which I've seen many of times. I call it a "best
choice" query -- you can easily match a row from one table against any
of a number of rows in another, the trick is to pick the one that
matters most. I've generally found that I want the query results to
show more than the columns used for making the choice (and there can be
many), which rules out the min/max technique. What works in a pretty
straitforward way, and generally optimizes at least as well as the
alternatives, is to join to the set of candidate rows and add a "not
exists" test to eliminate all but the best choice.

For your example, I've taken some liberties and added hypothetical
columns from both tables to the result set, to demonstrate how that
works. Feel free to drop them or substitute actual columns as you see
fit. This will work best if there is an index for the food table on
p_id and id. Please let me know whether this works for you.

select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
from food f join person p
on f.p_id = p.id
and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id >
f.id)
order by p_id

Note that this construct works for inner or outer joins and works
regardless of how complex the logic for picking the best choice is. I
think one reason this tends to optimize well is that an EXISTS test can
finish as soon as it finds one matching row.

-Kevin


>>> Junaili Lie <junaili(at)gmail(dot)com> 06/08/05 2:34 PM >>>
Hi,
I have the following table:
person - primary key id, and some attributes
food - primary key id, foreign key p_id reference to table person.

table food store all the food that a person is eating. The more recent
food is indicated by the higher food.id.

I need to find what is the most recent food a person ate for every
person.
The query:
select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
by f.p_id will work.
But I understand this is not the most efficient way. Is there another
way to rewrite this query? (maybe one that involves order by desc
limit 1)

Thank you in advance.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shanmugasundaram Doraisamy 2005-06-09 04:40:42 How to find the size of a database - reg.
Previous Message Steve Pollard 2005-06-09 03:56:57 Re: Importing from pg_dump slow, low Disk IO