From: | Hector Vass <hector(dot)vass(at)metametrics(dot)co(dot)uk> |
---|---|
To: | Muhannad Shubita <muhannadshubita(at)gmail(dot)com> |
Cc: | Samed YILDIRIM <samed(at)reddoc(dot)net>, "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
Subject: | Re: Please advice on query optimization |
Date: | 2017-04-20 22:04:01 |
Message-ID: | 92b4390f-9beb-4661-b9ba-b468dc42ccc0@email.android.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
or perhaps ...
select
day,
employee,
sum(total_hours),
max(case when r=1 then sign_in else null end) as first,
max(case when rr=1 then sign_in else null end) as last
from (
select
*,
row_number() over(partition by day,employee order by sign_in) as r,
row_number() over(partition by day,employee order by sign_in desc) as rr
from bioemployee
)x
group by day,employee
On 20 Apr 2017 9:09 p.m., Muhannad Shubita <muhannadshubita(at)gmail(dot)com> wrote:
Thanks folks for your help, I will benchmark the suggested solutions & see what's the best fit here.
Awesome mailing list :)
On Thu, Apr 20, 2017 at 11:37 PM, David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com<mailto:david(dot)g(dot)johnston(at)gmail(dot)com>> wrote:
Much easier to follow bottom-posting but I'll be consistent here.
It seems like you want a join of 4 sub-queries.
SELECT *
FROM unique_day_employee
LEFT JOIN aggregates_query USING (day, employee)
LEFT JOIN first_signin_query USING (day, employee)
LEFT JOIN last_signin_query USING (day, employee)
Where the definition of the first/last sign-in sub-queries are like
SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time ASC -- first
SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time DESC -- last
David J.
On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <muhannadshubita(at)gmail(dot)com<mailto:muhannadshubita(at)gmail(dot)com>> wrote:
Thanks Samed,
one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:
Day Employee Total-Hours First-Sign-In device-id-of-first-sign-in Last-Sign-In device-id-of-last-sign-in
20/4 emp1 4 8:22 202cb962ac5.. 3:25 152d234b70..
On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed(at)reddoc(dot)net<mailto:samed(at)reddoc(dot)net>> wrote:
Hi Muhannad,
Did you try using MIN and MAX function? I guess that following query solves your problem.
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
Best regards.
İyi çalışmalar.
Samed YILDIRIM
20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita(at)gmail(dot)com<mailto:muhannadshubita(at)gmail(dot)com>>:
Good day,
I have a table with biometric info about employees (let's call it bioemployee):
id serial not null -- primary key
employee_id -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER
I want to display details grouped by day & employee Id, for example:
Day Employee Total-Hours First-Sign-In Last-Sign-In
20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11
I created a pgsql function to get the details through the below query:
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:
for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP
but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill
is there a better way to do this?
Thanks.
--
Regards,
Muhannad
--
Regards,
Muhannad
From | Date | Subject | |
---|---|---|---|
Next Message | Gary Stainburn | 2017-05-05 08:25:04 | Most recent row |
Previous Message | Muhannad Shubita | 2017-04-20 20:07:35 | Re: Please advice on query optimization |