Re: Please advice on query optimization

From: Muhannad Shubita <muhannadshubita(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Samed YILDIRIM <samed(at)reddoc(dot)net>, "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Please advice on query optimization
Date: 2017-04-20 20:07:35
Message-ID: CAJQCA5FNag6FzqqDQZHG=L3q-NR1M0apvxiua1bohpNxAn1_Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Thanks folks for your help, I will benchmark the suggested solutions & see
what's the best fit here.

Awesome mailing list :)

On Thu, Apr 20, 2017 at 11:37 PM, David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> Much easier to follow bottom-posting but I'll be consistent here.
>
> It seems like you want a join of 4 sub-queries.
>
> SELECT *
> FROM unique_day_employee
> LEFT JOIN aggregates_query USING (day, employee)
> LEFT JOIN first_signin_query USING (day, employee)
> LEFT JOIN last_signin_query USING (day, employee)
>
> Where the definition of the first/last sign-in sub-queries are like
>
> SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl
> ORDER BY day, employee, time ASC -- first
> SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl
> ORDER BY day, employee, time DESC -- last
>
> David J.
>
>
> On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <
> muhannadshubita(at)gmail(dot)com> wrote:
>
>> Thanks Samed,
>>
>> one more question, what if I had other columns that cannot be used with
>> an aggregate function (TEXT-based for example) but still needed to be
>> paired with the first & last sign in? for instance, a randomly generated
>> md5-ed ID by the sign in device that needs to be displayed as a reference:
>>
>> Day Employee Total-Hours First-Sign-In device-id-of-first-sign-in Last-Sign-In
>> device-id-of-last-sign-in
>>
>> 20/4 emp1 4 8:22 202cb962ac5.. 3:25 152d234b70..
>>
>>
>> On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed(at)reddoc(dot)net> wrote:
>>
>>> Hi Muhannad,
>>>
>>> Did you try using MIN and MAX function? I guess that following query
>>> solves your problem.
>>>
>>> select employee_id, day, sum(total_hours) as total_hours, (select name
>>> from employee where id = employee_id) as emp_name, *min(sign_in) as
>>> first_sign_in, max(sign_in) as last_sign_in*
>>> from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day
>>> ORDER BY day;
>>>
>>> Best regards.
>>>
>>>
>>>
>>>
>>> İyi çalışmalar.
>>> Samed YILDIRIM
>>>
>>>
>>>
>>> 20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita(at)gmail(dot)com>:
>>>
>>> Good day,
>>>
>>> I have a table with biometric info about employees (let's call it
>>> bioemployee):
>>>
>>> id serial not null -- primary key
>>> employee_id -- foreign key references employee table
>>> day char(8) -- YY-MM-DD
>>> sign_in TIMESTAMP
>>> total_hours INTEGER
>>>
>>> I want to display details grouped by day & employee Id, for example:
>>>
>>>
>>> Day Employee Total-Hours First-Sign-In Last-Sign-In
>>>
>>> 20/4 emp1 4 8:22 3:25
>>> 21/4 emp1 7 9:00 4:11
>>> 21/4 emp2 2 11:00 01:11
>>>
>>>
>>> I created a pgsql function to get the details through the below query:
>>>
>>> select employee_id, day, sum(total_hours) as total_hours, (select name
>>> from employee where id = employee_id) as emp_name
>>> from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day
>>> ORDER BY day;
>>>
>>> now the problem is with getting First-Sign-In & Last-Sign-In per group
>>> (employee & day), I have currently implemented it in a FOR loop:
>>>
>>> for RECORD in query LOOP
>>> --First-Sign-In
>>> select sign_in from bioemployee where employee_id = -- and day = -- and
>>> sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
>>> --Last-Sign-In
>>> select sign_in from bioemployee where employee_id = -- and day = -- and
>>> sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
>>> return next json_build_object(first_sign_in, last_sign_in, ..rest of
>>> details);
>>> END LOOP
>>>
>>> but If I had 100 employees over a span of 30 days, this would be 6000
>>> queries inside the loop! which I am sure you would agree is an overkill
>>>
>>>
>>> is there a better way to do this?
>>>
>>>
>>> Thanks.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Muhannad
>>
>
>

--
Regards,
Muhannad

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Hector Vass 2017-04-20 22:04:01 Re: Please advice on query optimization
Previous Message David G. Johnston 2017-04-20 19:37:03 Re: Please advice on query optimization