From: | Junaili Lie <junaili(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Help with rewriting query |
Date: | 2005-06-10 01:30:37 |
Message-ID: | 8d04ce9905060918306fc4afb8@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Kevin,
Thanks for the reply.
I tried that query. It definately faster, but not fast enough (took
around 50 second to complete).
I have around 2.5 million on food and 1000 on person.
Here is the query plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..11662257.52 rows=1441579 width=16)
Merge Cond: ("outer".id = "inner".p_id)
-> Index Scan using person_pkey on person p (cost=0.00..25.17
rows=569 width=8)
-> Index Scan using p_id_food_index on food f
(cost=0.00..11644211.28 rows=1441579 width=16)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using p_id_food_index on food f2
(cost=0.00..11288.47 rows=2835 width=177)
Index Cond: (p_id = $0)
Filter: (id > $1)
(9 rows)
I appreciate if you have further ideas to troubleshoot this issue.
Thank you!
On 6/8/05, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> This is a pattern which I've seen many of times. I call it a "best
> choice" query -- you can easily match a row from one table against any
> of a number of rows in another, the trick is to pick the one that
> matters most. I've generally found that I want the query results to
> show more than the columns used for making the choice (and there can be
> many), which rules out the min/max technique. What works in a pretty
> straitforward way, and generally optimizes at least as well as the
> alternatives, is to join to the set of candidate rows and add a "not
> exists" test to eliminate all but the best choice.
>
> For your example, I've taken some liberties and added hypothetical
> columns from both tables to the result set, to demonstrate how that
> works. Feel free to drop them or substitute actual columns as you see
> fit. This will work best if there is an index for the food table on
> p_id and id. Please let me know whether this works for you.
>
> select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
> from food f join person p
> on f.p_id = p.id
> and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id >
> f.id)
> order by p_id
>
> Note that this construct works for inner or outer joins and works
> regardless of how complex the logic for picking the best choice is. I
> think one reason this tends to optimize well is that an EXISTS test can
> finish as soon as it finds one matching row.
>
> -Kevin
>
>
> >>> Junaili Lie <junaili(at)gmail(dot)com> 06/08/05 2:34 PM >>>
> Hi,
> I have the following table:
> person - primary key id, and some attributes
> food - primary key id, foreign key p_id reference to table person.
>
> table food store all the food that a person is eating. The more recent
> food is indicated by the higher food.id.
>
> I need to find what is the most recent food a person ate for every
> person.
> The query:
> select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> by f.p_id will work.
> But I understand this is not the most efficient way. Is there another
> way to rewrite this query? (maybe one that involves order by desc
> limit 1)
>
> Thank you in advance.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
> joining column's datatypes do not match
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
From | Date | Subject | |
---|---|---|---|
Next Message | Yann Michel | 2005-06-10 04:57:02 | Re: postgresql.conf runtime statistics default |
Previous Message | Junaili Lie | 2005-06-10 01:26:09 | Re: Help with rewriting query |