From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Alessandro Gagliardi <alessandro(at)path(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: efficient data reduction (and deduping) |
Date: | 2012-03-01 19:44:22 |
Message-ID: | CAGTBQpY-e-TTnd-+7wWeKZ8ecYjdjqeRt9LRN15toxG0To_o4g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thu, Mar 1, 2012 at 4:39 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> Interesting solution. If I'm not mistaken, this does solve the problem of
>> having two entries for the same user at the exact same time (which violates
>> my pk constraint) but it does so by leaving both of them out (since there is
>> no au1.hr_timestamp > au2.hr_timestamp in that case). Is that right?
>
> Yes, but it would have to be same *exact* time (not same hour).
>
> You can use more fields to desambiguate too, ie:
>
> au1.hr_timestamp > au2.hr_timestamp or (au1.hr_timestamp ==
> au2.hr_timestamp and au1.some_other_field > au2.some_other_field)
>
> If you have a sequential id to use in desambiguation, it would be best.
Sorry for double posting - but you can also *generate* such an identifier:
create sequence temp_seq;
with identified_au as ( select nextval('temp_seq') as id, * from
hourly_activity )
INSERT INTO hourly_activity
SELECT ... everything from au1 ...
FROM identified_au au1
LEFT JOIN identified_au au2 ON au2.user_id = au1.user_id
AND
date_trunc('hour', au2.hr_timestamp) = date_trunc('hour',
au1.hr_timestamp)
AND
au2.hr_timestamp < au1.hr_timestamp OR (au2.hr_timestamp =
au1.hr_timestamp AND au2.id < au1.id)
WHERE au2.user_id is null;
Should work if you have 9.x
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2012-03-01 20:12:02 | Re: efficient data reduction (and deduping) |
Previous Message | Alessandro Gagliardi | 2012-03-01 19:43:54 | Re: efficient data reduction (and deduping) |