Re: Retrieve most recent 1 record from joined table

From: agharta <agharta82(at)gmail(dot)com>
To: Brice André <brice(at)famille-andre(dot)be>
Cc: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Retrieve most recent 1 record from joined table
Date: 2014-08-22 08:48:21
Message-ID: 53F703D5.4050507@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi André,
Thanks a lot for you quickly reply!

Your solution is right, but I'm sorry because i cannot apply a group by
with max in my production query: it does other group by and sums.

I've try another solution that gives me a better performances:

WITH filtered_data AS (
select * from (
SELECT *, rank() OVER (PARTITION BY t2_id ORDER BY t3_date
DESC) as rank FROM table3
where t3_date <= timestamp '2014-08-20'
) xx where xx.rank =1
)
select * from table1 as t1
inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value like('%ab%') )
inner join filtered_data t3 on (t3.t2_id = t2.t2_id)
order by t3.t2_id, t3.t3_date desc

So, can it be the right way to solve my problem? There are some
side-effects? Could exist a better solution, aka "1.000.000$ question"
;-) ?

Thanks again for your support!

Best regards,

Agharta

On 08/22/2014 10:13 AM, Brice André wrote:
> Try to use a "GROUP BY " clause in your select, coupled with an
> aggregate function (MAX in your case ?). Note that you should not need
> any subquery, so a single select with inner join should be OK.
>
>
> 2014-08-22 10:05 GMT+02:00 agharta <agharta82(at)gmail(dot)com
> <mailto:agharta82(at)gmail(dot)com>>:
>
> Hi all,
> This is my first question, don't hate me please if it is the wrong
> place.
>
> Mi question is relative seimple, but goes me crazy.
>
> First, create some example tables:
>
> ----
> create table table1(
> t1_ID SERIAL primary key,
> t1_value text
> );
>
> create table table2(
> t2_ID SERIAL primary key,
> t2_value text,
> t1_id integer
> );
>
>
> create table table3(
> t3_ID SERIAL primary key,
> t3_value text,
> t3_date timestamp,
> t2_id integer
> );
>
> ALTER TABLE table2 ADD FOREIGN KEY (t1_id) REFERENCES table1
> (t1_ID) ON DELETE CASCADE;
> ALTER TABLE table3 ADD FOREIGN KEY (t2_id) REFERENCES table2
> (t2_ID) ON DELETE CASCADE;
>
> CREATE INDEX IDX_TABLE1_T1_value ON table1 (t1_value);
> CREATE INDEX IDX_TABLE2_T2_value ON table2 (t2_value);
> CREATE INDEX IDX_TABLE3_T3_value ON table3 (t3_value);
> CREATE INDEX IDX_TABLE3_T3_DATE ON table3 (t3_date);
> -----
>
> As you can see, table3 is referenced to table2 and table2 is
> referenced to table1
>
> Next, insert some data on tables:
>
> --one millon records into table1
> insert into table1(t1_value) select md5(random()::text) from
> (select generate_series(1,1000000) as a) as c;
>
> --100.000 records on table2 where t1_id is a random number between
> 1 and 10.001. This guarantees many records pointing to same table1
> t1_id
> insert into table2(t2_value, t1_id) select md5(random()::text),
> trunc((random()+1)*10000)::integer from (select
> generate_series(1,100000) as a) as c;
>
> --again, 1.022.401 records on table3 where t2_id is a random
> number between 1 and 10.001. This guarantee many records pointing
> to same table2 t2_id.
> -- random numbers (1.022.401) are generated by generated_series
> function with timestamp (1 minute) interval
> insert into table3(t3_value, t2_id, t3_date) select
> md5(random()::text), trunc((random()+1)*10000)::integer,
> c.date_val from (select generate_series(timestamp
> '2013-01-01',timestamp '2014-12-12', interval '1 minute') as
> date_val) as c;
>
>
> So, now we should have table3 with many rows per table2(t2_id) and
> many rows in table2 with same t1_id.
>
>
> Now, the question:
>
>
> Joining the tables, how to get ONLY most recent record per
> table3(t3_date)??
>
> Query example:
>
> select * from table1 as t1
> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value
> like('%ab%') )
> inner join table3 t3 on (t2.t2_id = t3.t2_id and t3.t3_date <=
> timestamp '2014-08-20')
> order by t3.t2_id, t3.t3_date desc
>
> produces to me this dataset (extract)
>
>
> t1_id t1_value t2_id t2_value t1_id t3_id
> t3_value t3_date t2_id
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 857683
> 0fb6cb380522dd83b6ac4beba2c6e98f 2014-08-19 14:42:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 857088
> 9c3481bfc0bdea51e62b338a1777cde6 2014-08-19 04:47:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 853208
> 253dc2083e70dd9e276867f43889973f 2014-08-16 12:07:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 851237
> 970acf901c4232c178b5dbeda4e44ac8 2014-08-15 03:16:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 847436
> cb8d52255eaa850f9d7f203092a2ce13 2014-08-12 11:55:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 819339
> d9e36ad622b1db499b3f623cdd46a811 2014-07-23 23:38:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 818022
> b9d13239f522e3a22f20d36ea6dab8ad 2014-07-23 01:41:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 803046
> c36164f3077894a1986c4922dfb632ec 2014-07-12 16:05:00.0 10020
> 17098 74127bc80ca759678892c957b6a34fc7 10020
> 9182c1f48cd008e31d781abc45723a10 17098 788129
> 32c966feab2212a29f86bebbaa6dfec9 2014-07-02 07:28:00.0 10020
>
>
> As you can see, there are many t3_id per single t2_id.
>
> I need the same dataset, but i want only the most recent (one)
> record per table3, and, of course, directly in the join rule (in
> production i have a long and complex query).
>
> My personal solution (very slow, i can't use it):
>
> select * from table1 as t1
> inner join table2 t2 on (t1.t1_id = t2.t1_id and t2.t2_value
> like('%ab%') )
> inner join table3 t3 on (
> t3.t3_id = (select t3_id from table3 where t2_id = t2.t2_id and
> t3_date <= timestamp '2014-08-20' order by t3_date desc fetch
> first 1 rows only)
> and t3.t2_id = t2.t2_id
> )
> order by t3.t2_id, t3.t3_date desc
>
>
> it gives me the right result, but performances are poor........
>
> Same bad performance result with a function that performs the
> table3 query.
>
> Suggestions?
>
> Thanks to anyone who can answer to me!!!!
>
> Best regards,
>
> Agharta
>
>
>
>
>
>
>
>
>
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org
> <mailto:pgsql-sql(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Vik Fearing 2014-08-23 12:15:46 Re: Retrieve most recent 1 record from joined table
Previous Message Brice André 2014-08-22 08:13:13 Re: Retrieve most recent 1 record from joined table