Retrieve most recent 1 record from joined table

From: agharta <agharta82(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Retrieve most recent 1 record from joined table
Date: 2014-08-22 08:05:46
Message-ID: 53F6F9DA.4060005@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi all,
This is my first question, don't hate me please if it is the wrong place.

Mi question is relative seimple, but goes me crazy.

First, create some example tables:

----
create table table1(
t1_ID SERIAL primary key,
t1_value text
);

create table table2(
t2_ID SERIAL primary key,
t2_value text,
t1_id integer
);

create table table3(
t3_ID SERIAL primary key,
t3_value text,
t3_date timestamp,
t2_id integer
);

ALTER TABLE table2 ADD FOREIGN KEY (t1_id) REFERENCES table1 (t1_ID) ON
DELETE CASCADE;
ALTER TABLE table3 ADD FOREIGN KEY (t2_id) REFERENCES table2 (t2_ID) ON
DELETE CASCADE;

CREATE INDEX IDX_TABLE1_T1_value ON table1 (t1_value);
CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);
CREATE INDEX IDX_TABLE3_T3_value ON table3 (t3_value);
CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);
-----

As you can see, table3 is referenced to table2 and table2 is referenced
to table1

Next, insert some data on tables:

--one millon records into table1
insert into table1(t1_value) select md5(random()::text) from (select
generate_series(1,1000000) as a) as c;

--100.000 records on table2 where t1_id is a random number between 1 and
10.001. This guarantees many records pointing to same table1 t1_id
insert into table2(t2_value, t1_id) select md5(random()::text),
trunc((random()+1)*10000)::integer from (select
generate_series(1,100000) as a) as c;

--again, 1.022.401 records on table3 where t2_id is a random number
between 1 and 10.001. This guarantee many records pointing to same
table2 t2_id.
-- random numbers (1.022.401) are generated by generated_series function
with timestamp (1 minute) interval
insert into table3(t3_value, t2_id, t3_date) select md5(random()::text),
trunc((random()+1)*10000)::integer, c.date_val from (select
generate_series(timestamp '2013-01-01',timestamp '2014-12-12', interval
'1 minute') as date_val) as c;

So, now we should have table3 with many rows per table2(t2_id) and many
rows in table2 with same t1_id.

Now, the question:

Joining the tables, how to get ONLY most recent record per table3(t3_date)??

Query example:

select * from table1 as t1
inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <= timestamp
'2014-08-20')
order by t3.t2_id, t3.t3_date desc

produces to me this dataset (extract)

t1_id t1_value t2_id t2_value t1_id
t3_id t3_value t3_date
t2_id
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 857683
0fb6cb380522dd83b6ac4beba2c6e98f 2014-08-19 14:42:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 857088
9c3481bfc0bdea51e62b338a1777cde6 2014-08-19 04:47:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 853208
253dc2083e70dd9e276867f43889973f 2014-08-16 12:07:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 851237
970acf901c4232c178b5dbeda4e44ac8 2014-08-15 03:16:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 847436
cb8d52255eaa850f9d7f203092a2ce13 2014-08-12 11:55:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 819339
d9e36ad622b1db499b3f623cdd46a811 2014-07-23 23:38:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 818022
b9d13239f522e3a22f20d36ea6dab8ad 2014-07-23 01:41:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 803046
c36164f3077894a1986c4922dfb632ec 2014-07-12 16:05:00.0 10020
17098 74127bc80ca759678892c957b6a34fc7 10020
9182c1f48cd008e31d781abc45723a10 17098 788129
32c966feab2212a29f86bebbaa6dfec9 2014-07-02 07:28:00.0 10020

As you can see, there are many t3_id per single t2_id.

I need the same dataset, but i want only the most recent (one) record
per table3, and, of course, directly in the join rule (in production i
have a long and complex query).

My personal solution (very slow, i can't use it):

select * from table1 as t1
inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
inner join table3 t3 on (
t3.t3_id = (select t3_id from table3 where t2_id = t2.t2_id and t3_date
<= timestamp '2014-08-20' order by t3_date desc fetch first 1 rows only)
and t3.t2_id = t2.t2_id
)
order by t3.t2_id, t3.t3_date desc

it gives me the right result, but performances are poor........

Same bad performance result with a function that performs the table3 query.

Suggestions?

Thanks to anyone who can answer to me!!!!

Best regards,

Agharta

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Brice André 2014-08-22 08:13:13 Re: Retrieve most recent 1 record from joined table
Previous Message Adrian Klaver 2014-08-18 22:16:53 Re: function call