Re: Optimizing execution of expensive subqueries

From: Mathieu Fenniak <mathieu(dot)fenniak(at)replicon(dot)com>
To: hivs77(at)gmail(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Optimizing execution of expensive subqueries
Date: 2018-07-13 13:34:39
Message-ID: CAHoiPjyJ5RiMRYS3F6htsohhTKvV-tMNygGu=tVp6bNTh+f2hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Hellmuth,

Thanks for the response and the new approach; a LATERAL JOIN is new to me.
Unfortunately it seems to have the same performance characteristics and
query plan. The aggregation in the lateral join still executes for every
row (eg. if my base query has 500000 rows, I get "Aggregate (...
loops=500000)" in the query plan), unaffected by the later LIMIT node in
the query plan.

The CTE approach seems to be the only one I can use to improve performance
right now, but requires significant application code changes.

Mathieu

On Wed, Jul 11, 2018 at 1:55 PM Hellmuth Vargas <hivs77(at)gmail(dot)com> wrote:

> Hi
>
> Try this way:
>
> SELECT
> tbl.field1, tbl.field2, tbl.field3, ...,
> b.Thingy1Sum,
> ... repeat for multiply thingies ...
> FROM
> tbl
> LATERAL JOIN (
> SELECT anothertbl.UserId,SUM(Duration) as Thingy1Sum
> FROM anothertbl
> WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1
> group by 1) as b on tbl.UserId=b.UserId
> ORDER BY tbl.field1 LIMIT 20
>
>
> El mié., 11 de jul. de 2018 a la(s) 09:25, Mathieu Fenniak (
> mathieu(dot)fenniak(at)replicon(dot)com) escribió:
>
>> Hi pgsql-general!
>>
>> I'm currently looking at a query that is generally selecting a bunch of
>> simple columns from a table, and also performing some subqueries to
>> aggregate related data, and then sorting by one of the simple columns and
>> paginating the result.
>>
>> eg.
>>
>> SELECT
>> tbl.field1, tbl.field2, tbl.field3, ...,
>> (SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId
>> AND anothertbl.ThingyId = 1) as Thingy1Sum,
>> ... repeat for multiply thingies ...
>> FROM
>> tbl
>> ORDER BY tbl.field1 LIMIT 20
>>
>> I'm finding that if "tbl" contains hundreds of thousands of rows, the
>> subqueries are being executed hundreds of thousands of times. Because of
>> the sorting and pagination, this is appears to be unnecessary, and the
>> result is slow performance. (PostgreSQL 9.5.9 server)
>>
>> I've only found one solution so far, which is to perform the sort &
>> pagination in a CTE, and the subqueries externally. Are there any other
>> approaches that can be taken to optimize this and prevent the unnecessary
>> computation?
>>
>> CTE rewrite:
>>
>> WITH cte AS (
>> SELECT
>> tbl.field1, tbl.field2, tbl.field3
>> FROM
>> tbl
>> ORDER BY tbl.field1 LIMIT 20
>> )
>> SELECT cte.*,
>> (SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId
>> = tbl.UserId AND anothertbl.ThingyId = 1) as Thingy1Sum,
>> ... repeat for multiply thingies ...
>> FROM cte;
>>
>> Thanks for any thoughts you have,
>>
>> Mathieu Fenniak
>>
>
>
> --
> Cordialmente,
>
> Ing. Hellmuth I. Vargas S.
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Guillaume Lelarge 2018-07-13 13:36:16 Re: Disable TRUST authentication by using ClientAuthentication_hook
Previous Message Adrian Klaver 2018-07-13 13:29:14 Re: ODBC - Getting CONN ERROR: errmsg='The buffer was too small for the InfoValue'