Re: Fwd: temp_file_limit?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Frits Jalvingh <jal(at)etc(dot)to>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Fwd: temp_file_limit?
Date: 2022-12-20 21:11:09
Message-ID: 20221220211109.GI1153@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Dec 19, 2022 at 09:10:27PM +0100, Frits Jalvingh wrote:
> @justin
>
> I tried the create statistics variant and that definitely improves the
> estimate, and with that one of the "bad" cases (the one with the 82 minute
> plan) now creates a good plan using only a few seconds.
> That is a worthwhile path to follow. A bit hard to do, because those
> conditions can be anything, but I can probably calculate the ones used per
> customer and create those correlation statistics from that... It is
> definitely better than tweaking the "poor man's query hints" enable_xxxx
> 8-/ which is really not helping with plan stability either.
>
> That will be a lot of work, but I'll let you know the results ;)

Yeah, if the conditions are arbitrary, then it's going to be more
difficult. Hopefully you don't have too many columns. :)

I suggest enabling autoexplain and monitoring for queries which were
slow, and retroactively adding statistics to those columns which are
most-commonly queried, and which have correlations (which the planner
doesn't otherwise know about).

You won't want to have more than a handful of columns in a stats object
(since it requires factorial(N) complexity), but you can have multiple
stats objects with different combinations of columns (and, in v14,
expressions). You can also set a lower stats target to make the cost a
bit lower.

You could try to check which columns are correlated, either by running:
| SELECT COUNT(1),col1,col2 FROM tbl GROUP BY 2,3 ORDER BY 1;
for different combinations of columns.

Or by creating a tentative/experimental stats object on a handful of
columns at a time for which you have an intuition about their
correlation, and then checking the calculated dependencies FROM
pg_stats_ext. You may need to to something clever to use that for
arbitrarily columns. Maybe this is a start.
| SELECT dep.value::float, tablename, attnames, dep.key, exprs FROM (SELECT (json_each_text(dependencies::text::json)).* AS dep, * FROM pg_stats_ext)dep WHERE dependencies IS NOT NULL ORDER BY 1 DESC ; -- AND regexp_count(key, ',') < 2

--
Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Frits Jalvingh 2022-12-23 08:12:08 Re: Fwd: temp_file_limit?
Previous Message Frits Jalvingh 2022-12-19 20:32:33 Re: Fwd: temp_file_limit?