From: | agharta <agharta82(at)gmail(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Retrieve most recent 1 record from joined table |
Date: | 2014-08-22 08:05:46 |
Message-ID: | 53F6F9DA.4060005@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi all,
This is my first question, don't hate me please if it is the wrong place.
Mi question is relative seimple, but goes me crazy.
First, create some example tables:
----
create table table1(
t1_ID SERIAL primary key,
t1_value text
);
create table table2(
t2_ID SERIAL primary key,
t2_value text,
t1_id integer
);
create table table3(
t3_ID SERIAL primary key,
t3_value text,
t3_date timestamp,
t2_id integer
);
ALTER TABLE table2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON
DELETE CASCADE;
ALTER TABLE table3 ADD FOREIGN KEY (t2_id) REFERENCES table2 (t2_ID) ON
DELETE CASCADE;
CREATE INDEX IDX_TABLE1_T1_value ON table1 (t1_value);
CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);
CREATE INDEX IDX_TABLE3_T3_value ON table3 (t3_value);
CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);
-----
As you can see, table3 is referenced to table2 and table2 is referenced
to table1
Next, insert some data on tables:
--one millon records into table1
insert into table1(t1_value) select md5(random()::text) from (select
generate_series(1,1000000) as a) as c;
--100.000 records on table2 where t1_id is a random number between 1 and
10.001. This guarantees many records pointing to same table1 t1_id
insert into table2(t2_value, t1_id) select md5(random()::text),
trunc((random()+1)*10000)::integer from (select
generate_series(1,100000) as a) as c;
--again, 1.022.401 records on table3 where t2_id is a random number
between 1 and 10.001. This guarantee many records pointing to same
table2 t2_id.
-- random numbers (1.022.401) are generated by generated_series function
with timestamp (1 minute) interval
insert into table3(t3_value, t2_id, t3_date) select md5(random()::text),
trunc((random()+1)*10000)::integer, c.date_val from (select
generate_series(timestamp '2013-01-01',timestamp '2014-12-12', interval
'1 minute') as date_val) as c;
So, now we should have table3 with many rows per table2(t2_id) and many
rows in table2 with same t1_id.
Now, the question:
Joining the tables, how to get ONLY most recent record per table3(t3_date)??
Query example:
select * from table1 as t1
inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp
'2014-08-20')
order by t3.t2_id, t3.t3_date desc
produces to me this dataset (extract)
t1_id t1_value t2_id t2_value t1_id
t3_id t3_value t3_date
t2_id
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 857683
0fb6cb380522dd83b6ac4beba2c6e98f 2014-08-19 14:42:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 857088
9c3481bfc0bdea51e62b338a1777cde6 2014-08-19 04:47:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 853208
253dc2083e70dd9e276867f43889973f 2014-08-16 12:07:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 851237
970acf901c4232c178b5dbeda4e44ac8 2014-08-15 03:16:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 847436
cb8d52255eaa850f9d7f203092a2ce13 2014-08-12 11:55:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 819339
d9e36ad622b1db499b3f623cdd46a811 2014-07-23 23:38:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 818022
b9d13239f522e3a22f20d36ea6dab8ad 2014-07-23 01:41:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 803046
c36164f3077894a1986c4922dfb632ec 2014-07-12 16:05:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 788129
32c966feab2212a29f86bebbaa6dfec9 2014-07-02 07:28:00.0 10020
As you can see, there are many t3_id per single t2_id.
I need the same dataset, but i want only the most recent (one) record
per table3, and, of course, directly in the join rule (in production i
have a long and complex query).
My personal solution (very slow, i can't use it):
select * from table1 as t1
inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
inner join table3 t3 on (
t3.t3_id = (select t3_id from table3 where t2_id = t2.t2_id and t3_date
<= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only)
and t3.t2_id = t2.t2_id
)
order by t3.t2_id, t3.t3_date desc
it gives me the right result, but performances are poor........
Same bad performance result with a function that performs the table3 query.
Suggestions?
Thanks to anyone who can answer to me!!!!
Best regards,
Agharta
From | Date | Subject | |
---|---|---|---|
Next Message | Brice André | 2014-08-22 08:13:13 | Re: Retrieve most recent 1 record from joined table |
Previous Message | Adrian Klaver | 2014-08-18 22:16:53 | Re: function call |