Re: Worse performance with higher work_mem?

From: Justin <zzzzz(dot)graf(at)gmail(dot)com>
To: Israel Brewster <ijbrewster(at)alaska(dot)edu>
Cc: Rob Sargent <robjsargent(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Worse performance with higher work_mem?
Date: 2020-01-14 17:32:00
Message-ID: CALL-XeNfQH9eCU=KOSe592ij36HY_XWQL4OkzpT7m8FVTJJzmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Updating the stats can be done via vacuum or analyze command,

https://www.postgresql.org/docs/12/sql-analyze.html. To just analyze a
table typically does not take much time. and can be scheduled to run so the
stats update instead of waiting on auto-vacuum to deal with it which could
be some time on an insert only table

Seeing the difference in speed between first run with low work memory vs
high work memory with parallelization, I suspect the temp tables never
actually got written to disk they just hung out in the OS IO cache.

The query in all examples is hung up doing Index scan and running the avg()
aggregate.

Maybe you can look at creating summary table for time periods to work
against, maybe a Weekly or Daily summary of these values could cut down on
the number of records being processed. It would not affect the result

On Tue, Jan 14, 2020 at 12:08 PM Israel Brewster <ijbrewster(at)alaska(dot)edu>
wrote:

>
> On Jan 13, 2020, at 3:46 PM, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
>
>
>
> On Jan 13, 2020, at 5:41 PM, Israel Brewster <ijbrewster(at)alaska(dot)edu>
> wrote:
>
> On Jan 13, 2020, at 3:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Israel Brewster <ijbrewster(at)alaska(dot)edu> writes:
>
> In looking at the explain analyze output, I noticed that it had an
> “external merge Disk” sort going on, accounting for about 1 second of the
> runtime (explain analyze output here: https://explain.depesz.com/s/jx0q <
> https://explain.depesz.com/s/jx0q>). Since the machine has plenty of RAM
> available, I went ahead and increased the work_mem parameter. Whereupon the
> query plan got much simpler, and performance of said query completely
> tanked, increasing to about 15.5 seconds runtime (
> https://explain.depesz.com/s/Kl0S <https://explain.depesz.com/s/Kl0S>),
> most of which was in a HashAggregate.
> How can I fix this? Thanks.
>
>
> Well, the brute-force way not to get that plan is "set enable_hashagg =
> false". But it'd likely be a better idea to try to improve the planner's
> rowcount estimates. The problem here seems to be lack of stats for
> either "time_bucket('1 week', read_time)" or "read_time::date".
> In the case of the latter, do you really need a coercion to date?
> If it's a timestamp column, I'd think not. As for the former,
> if the table doesn't get a lot of updates then creating an expression
> index on that expression might be useful.
>
>
> Thanks for the suggestions. Disabling hash aggregates actually made things
> even worse: (https://explain.depesz.com/s/cjDg), so even if that wasn’t a
> brute-force option, it doesn’t appear to be a good one. Creating an index
> on the time_bucket expression didn’t seem to make any difference, and my
> data does get a lot of additions (though virtually no changes) anyway
> (about 1 additional record per second). As far as coercion to date, that’s
> so I can do queries bounded by date, and actually have all results from
> said date included. That said, I could of course simply make sure that when
> I get a query parameter of, say, 2020-1-13, I expand that into a full
> date-time for the end of the day. However, doing so for a test query didn’t
> seem to make much of a difference either:
> https://explain.depesz.com/s/X5VT
>
> So, to summarise:
>
> Set enable_hasagg=off: worse
> Index on time_bucket expression: no change in execution time or query plan
> that I can see
> Get rid of coercion to date: *slight* improvement. 14.692 seconds instead
> of 15.5 seconds. And it looks like the row count estimates were actually
> worse.
> Lower work_mem, forcing a disk sort and completely different query plan:
> Way, way better (around 6 seconds)
>
> …so so far, it looks like the best option is to lower the work_mem, run
> the query, then set it back?
> ---
>
>
> I don’t see that you’ve updated the statistics?
>
>
> Ummmm….no. I know nothing about that :-)
>
> Some research tells me that a) it should happen as part of the autovacuum
> process, and that b) I may not be running autovacuum enough, since it is a
> large table and doesn’t change often. But I don’t really know.
>
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory
> Geophysical Institute - UAF
> 2156 Koyukuk Drive
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell: 907-328-9145
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-01-14 17:54:09 Re: Worse performance with higher work_mem?
Previous Message João Haas 2020-01-14 17:26:31 Multiple Aggregations Order