Re: performance of analytical query

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Jiří Fejfar <jurafejfar(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: performance of analytical query
Date: 2021-11-12 02:41:51
Message-ID: 20211112024151.GF17618@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Nov 11, 2021 at 08:20:57PM +0100, Jiří Fejfar wrote:
> Hi folks,
>
> we have found that (probably after VACUUM ANALYZE) one analytical query
> starts to be slow on our production DB. Moreover, more or less the same
> plan is used on our testing data (how to restore our testing data is
> described at the end of this email), or better to say the same problem
> exists in both (production vs testing data) scenarios: nested loop scanning
> CTE several thousand times is used due to the bad estimates:
> https://explain.dalibo.com/plan/sER#plan/node/87 (query is included on
> dalibo).

> Do you have any idea how to get HASH JOINS in the CTE w_1p_data instead of
> NESTED LOOPs?
> * Add some statistics to not get bad estimates on "lower-level" CTEs?

Do you know why the estimates are bad ?

Index Scan using t_map_plot_cell__cell_gid__idx on cm_plot2cell_mapping cm_plot2cell_mapping (cost=0.29..18.59 rows=381 width=12) (actual time=0.015..2.373 rows=3,898 loops=1)
Index Cond: (cm_plot2cell_mapping.estimation_cell = f_a_cell.estimation_cell)
Buffers: shared hit=110

I don't know, but is the estimate for this portion of the plan improved by doing:
| ALTER TABLE f_a_cell ALTER estimation_cell SET STATISTICS 500; ANALYZE f_a_cell;

> * In a slightly more complicated function I used temporary tables to be
> able to narrow statistics [2] but I am afraid of system table bloating
> because of the huge amount of usage of this function on the production
> (hundred thousand of calls by day when data are to be analyzed).

I would try this for sure - I think hundreds of calls per day would be no
problem. If you're concerned, you could add manual calls to do (for example)
VACUUM pg_attribute; after dropping the temp tables.

BTW, we disable nested loops for the our analytic report queries. I have never
been able to avoid pathological plans any other way.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ashkil Dighin 2021-11-12 14:12:30 Re: Lock contention high
Previous Message Jiří Fejfar 2021-11-11 19:20:57 performance of analytical query