From: | Fran Fabrizio <ffabrizio(at)mmrd(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | getting most recent row efficiently |
Date: | 2001-12-17 21:30:18 |
Message-ID: | 3C1E63EA.D46B8B58@mmrd.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
I've got a table that keeps a log of a person's favorites over time
(what follows is a simplified example)
person_id favorite_color
1 red
1 blue
2 green
3 yellow
3 purple
I want the set of most recent entries for each person. So assuming the
order they appear above is the order they were inserted, I want:
person_id favorite_color
1 blue
2 green
3 purple
Is there any way to retrieve "the most recent row for each distinct
person_id?" without making a timestamp column for when the row was
inserted and then grouping them by person_id?
What I include below is the real world example. There's a table,
doctor_favorites, that functions like my sample table above. There's a
view, current_doctor_favorites, which contains the most recent row for
each doctor. This view has a high cost as you can see. There's a
second view, doctor_metrics, which combines the data from the
current_doctor_favorites view with that of another table, and that is my
ultimate goal. It's taking a loooong time to run, I think because of
the sequential scan when creating the current_doctor_favorites view from
the doctor_favorites table. I'm wondering if there's a way to perhaps
use an OID or something to determine the most recent rows from
doctor_favorites rather than the timestamp and the subselect. I tried
to create an index on the local_ts timestamp field of the underlying
doctor_favorites table, but it's hard to index something that's
basically unique each time you insert a row anyhow.
monitor-prod=# \d doctor_favorites
Table "doctor_favorites"
Attribute | Type | Modifier
-----------------+--------------------------+------------------------
favorites | integer |
remote_ts | timestamp with time zone | not null
local_ts | timestamp with time zone | not null default now()
med_practice_id | integer |
doctor_id | integer |
Indices: docid_index,
docid_medpracid_index,
localts_index,
medpracid_index
monitor-test=# \d current_doctor_favorites
View "current_doctor_favorites"
Attribute | Type | Modifier
-----------------+---------+----------
doctor_id | integer |
med_practice_id | integer |
favorites | integer |
View definition: SELECT df.doctor_id, df.med_practice_id, df.favorites
FROM doctor_favorites df WHERE (df.local_ts = (SELECT
max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE
((doctor_favorites.doctor_id = df.doctor_id) AND
(doctor_favorites.med_practice_id = df.med_practice_id))));
monitor-test=# explain SELECT df.doctor_id, df.med_practice_id,
df.favorites FROM doctor_favorites df WHERE (df.local_ts = (SELECT
max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE
((doctor_favorites.doctor_id = df.doctor_id) AND
(doctor_favorites.med_practice_id = df.med_practice_id))));
NOTICE: QUERY PLAN:
Seq Scan on doctor_favorites df (cost=0.00..9564.75 rows=47 width=12)
SubPlan
-> Aggregate (cost=2.02..2.02 rows=1 width=8)
-> Index Scan using docid_medpracid_index on
doctor_favorites (cost=0.00..2.02 rows=1 width=8)
EXPLAIN
monitor-test=#
monitor-prod=# \d doctor_metrics
View "doctor_metrics"
Attribute | Type | Modifier
-----------------+---------+----------
doctor_id | integer |
med_practice_id | integer |
lab_requests | integer |
lab_results | integer |
rx_auth | integer |
transcriptions | integer |
omnidocs | integer |
phone | integer |
favorites | integer |
View definition: SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL) THEN
doctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS
doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN
doctor_tasks.med_practice_id ELSE
current_doctor_favorites.med_practice_id END AS med_practice_id,
doctor_tasks.lab_requests, doctor_tasks.lab_results,
doctor_tasks.rx_auth, doctor_tasks.transcriptions,
doctor_tasks.omnidocs, doctor_tasks.phone,
current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN
current_doctor_favorites USING (doctor_id, med_practice_id));
monitor-prod=# explain SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL)
THEN do
ctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS
doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN
doctor_tasks.med_practice_id ELSE
current_doctor_favorites.med_practice_id END AS med_practice_id,
doctor_tasks.lab_requests, doctor_tasks.lab_results,
doctor_tasks.rx_auth, doctor_tasks.transcriptions,
doctor_tasks.omnidocs, doctor_tasks.phone,
current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN
current_doctor_favorites USING (doctor_id, med_practice_id));
NOTICE: QUERY PLAN:
Merge Join (cost=9566.05..9640.75 rows=1000 width=44)
-> Index Scan using doctor_tasks_pkey on doctor_tasks
(cost=0.00..59.00 rows=1000 width=32)
-> Sort (cost=9566.05..9566.05 rows=47 width=12)
-> Subquery Scan current_doctor_favorites (cost=0.00..9564.75
rows=47 width=12)
-> Seq Scan on doctor_favorites df (cost=0.00..9564.75
rows=47 width=12)
SubPlan
-> Aggregate (cost=2.02..2.02 rows=1 width=8)
-> Index Scan using docid_medpracid_index
on doctor_favorites (cost=0.00..2.02 rows=1 width=8)
EXPLAIN
monitor-prod=#
Thanks for your feedback,
Fran
From | Date | Subject | |
---|---|---|---|
Next Message | SHELTON,MICHAEL (Non-HP-Boise,ex1) | 2001-12-17 22:11:03 | Re: getting most recent row efficiently |
Previous Message | Darren Ferguson | 2001-12-17 21:10:13 | Re: Query Help |