Re: Performance of complicated query

From: Jonathan Morra <jonmorra(at)gmail(dot)com>
To: james(at)mansionfamily(dot)plus(dot)com
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance of complicated query
Date: 2013-05-28 14:43:32
Message-ID: CAF8LAAUp2FyirH42W=whC7ao7HGrH6UvGWcj-DPLq4Pbn53suw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have been working on this query, and I was able to modify it and get it's
run time cut in half. Here's where it is right now:

SELECT first_name, last_name, serial_number, latest_read, value,
lifetime_value, lifetime.patient_id
FROM (
SELECT DISTINCT patient_id, first_name, last_name, MAX(read_datetime)
OVER(PARTITION BY patient_id) AS latest_read
, SUM(value) OVER(PARTITION BY patient_id) AS value,
first_value(serial_number) OVER(PARTITION BY patient_id ORDER BY
read_datetime DESC) AS serial_number
FROM (
SELECT patient_devices.device_id, patient_id, MAX(value - issued_value) AS
value, MAX(read_datetime) AS read_datetime
FROM read_reads
INNER JOIN patient_devices ON patient_devices.device_id =
read_reads.device_id
AND read_datetime >= issuance_datetime
AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)
WHERE read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'
GROUP BY patient_devices.id
) AS first
INNER JOIN devices ON devices.id = device_id
INNER JOIN patients ON patient_id = patients.id
) AS filtered
INNER JOIN (
SELECT patient_id, SUM(value) AS lifetime_value
FROM (
SELECT patient_id, MAX(value - issued_value) AS value FROM read_reads
INNER JOIN patient_devices ON patient_devices.device_id =
read_reads.device_id
AND read_datetime >= issuance_datetime
AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)
GROUP BY patient_devices.id
) AS first GROUP BY patient_id
) AS lifetime ON filtered.patient_id = lifetime.patient_id

I think the key to cutting it down was moving some of the joins up a level.
Even though this is faster, I'd still like to cut it down a bunch more (as
this will be run a lot in my application). Any more insight would be
greatly appreciated. A summary of explain (analyze, buffers) can be found
at http://explain.depesz.com/s/qx7f.

Thanks

On Thu, May 23, 2013 at 5:21 PM, Jonathan Morra <jonmorra(at)gmail(dot)com> wrote:

> Sorry for the messy query, I'm very new to writing these complex queries.
> I'll try and make it easier to read by using WITH clauses. However, just
> to clarify, the WITH clauses only increase readability and not performance
> in any way, right?
>
>
> On Thu, May 23, 2013 at 4:22 PM, james <james(at)mansionfamily(dot)plus(dot)com>wrote:
>
>> On 23/05/2013 22:57, Jonathan Morra wrote:
>>
>> I'm not sure I understand your proposed solution. There is also the
>> case to consider where the same patient can be assigned the same device
>> multiple times. In this case, the value may be reset at each assignment
>> (hence the line value - issued_value AS value from the original query).
>>
>>
>> Perhaps you could use triggers to help somewhat? At least for the
>> lifetime part.
>>
>> For a given assignment of a device to a patient, only the last value is
>> useful, so you can maintain that easily enough (a bit like a materialised
>> view but before 9.3 I guess).
>>
>> But, that might fix 'lifetime' but not some arbitrary windowed view. I
>> can see why an 'as at' end time is useful, but not why a start time is so
>> useful: if a device has readings before the window but not in the window,
>> is that 'no reading' or should the last reading prior to the window apply?
>>
>> It also seems to me that the solution you have is hard to reason about.
>> Its like a Haskell program done in one big inline fold rather than a bunch
>> of 'where' clauses, and I find these cause significant brain overload.
>>
>> Perhaps you could break it out into identifiable chunks that work out
>> (both for lifetime if not using triggers, and for your date range
>> otherwise) the readings that are not superceded (ie the last in the date
>> bounds for a device assignment), and then work with those. Consider the
>> CTE 'WITH queries' for doing this?
>>
>> It seems to me that if you can do this, then the problem might be easier
>> to express.
>>
>> Failing that, I'd be looking at using temporary tables, and forcing a
>> series of reduce steps using them, but then I'm a nasty old Sybase hacker
>> at heart. ;-)
>>
>>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Crawford 2013-05-28 16:04:38 Re: Performance of complicated query
Previous Message Matheus de Oliveira 2013-05-28 12:39:45 Re: Slow SELECT by primary key? Postgres 9.1.2