Re: select with max functions

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
Cc: Gerardo Herzig <gherzig(at)fmed(dot)uba(dot)ar>, pgsql-performance(at)postgresql(dot)org
Subject: Re: select with max functions
Date: 2017-10-03 21:16:37
Message-ID: 002ad4b4-ec5f-4e9f-fa48-5c19405292ee@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 03/10/17 04:29, Tom Lane wrote:
> Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com> writes:
>> explain analyze SELECT Ma.User_Id,
>> COUNT(*) COUNT
>> FROM Manuim Ma
>> WHERE Ma.Bb_Open_Date =
>> (SELECT Bb_Open_Date
>> FROM Manuim Man
>> WHERE Man.User_Id = Ma.User_Id order
>> by bb_open_date desc limit 1
>> )
>> GROUP BY Ma.User_Id
>> HAVING COUNT(*) > 1;
> The core problem with this query is that the sub-select has to be done
> over again for each row of the outer table, since it's a correlated
> sub-select (ie, it refers to Ma.User_Id from the outer table). Replacing
> a max() call with handmade logic doesn't do anything to help that.
> I'd try refactoring it so that you calculate the max Bb_Open_Date just
> once for each user id, perhaps along the lines of
>
> SELECT Ma.User_Id,
> COUNT(*) COUNT
> FROM Manuim Ma,
> (SELECT User_Id, max(Bb_Open_Date) as max
> FROM Manuim Man
> GROUP BY User_Id) ss
> WHERE Ma.User_Id = ss.User_Id AND
> Ma.Bb_Open_Date = ss.max
> GROUP BY Ma.User_Id
> HAVING COUNT(*) > 1;
>
> This is still not going to be instantaneous, but it might be better.
>
> It's possible that an index on (User_Id, Bb_Open_Date) would help,
> but I'm not sure.
>
> regards, tom lane
>
>

Further ideas based on Tom's rewrite: If that MAX is still expensive it
might be worth breaking

SELECT User_Id, max(Bb_Open_Date) as max
FROM Manuim Man
GROUP BY User_Id

out into a VIEW, and considering making it MATERIALIZED, or creating an
equivalent  trigger based summary table (there are examples in the docs
of how to do this).

Cheers

Mark

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2017-10-03 21:44:16 Re: How does max_parallel_workers_per_gather change load averages?
Previous Message Ben Nachtrieb 2017-10-03 19:48:39 How does max_parallel_workers_per_gather change load averages?