From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Frits Jalvingh <jal(at)etc(dot)to> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Fwd: temp_file_limit? |
Date: | 2022-12-20 21:11:09 |
Message-ID: | 20221220211109.GI1153@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, Dec 19, 2022 at 09:10:27PM +0100, Frits Jalvingh wrote:
> @justin
>
> I tried the create statistics variant and that definitely improves the
> estimate, and with that one of the "bad" cases (the one with the 82 minute
> plan) now creates a good plan using only a few seconds.
> That is a worthwhile path to follow. A bit hard to do, because those
> conditions can be anything, but I can probably calculate the ones used per
> customer and create those correlation statistics from that... It is
> definitely better than tweaking the "poor man's query hints" enable_xxxx
> 8-/ which is really not helping with plan stability either.
>
> That will be a lot of work, but I'll let you know the results ;)
Yeah, if the conditions are arbitrary, then it's going to be more
difficult. Hopefully you don't have too many columns. :)
I suggest enabling autoexplain and monitoring for queries which were
slow, and retroactively adding statistics to those columns which are
most-commonly queried, and which have correlations (which the planner
doesn't otherwise know about).
You won't want to have more than a handful of columns in a stats object
(since it requires factorial(N) complexity), but you can have multiple
stats objects with different combinations of columns (and, in v14,
expressions). You can also set a lower stats target to make the cost a
bit lower.
You could try to check which columns are correlated, either by running:
| SELECT COUNT(1),col1,col2 FROM tbl GROUP BY 2,3 ORDER BY 1;
for different combinations of columns.
Or by creating a tentative/experimental stats object on a handful of
columns at a time for which you have an intuition about their
correlation, and then checking the calculated dependencies FROM
pg_stats_ext. You may need to to something clever to use that for
arbitrarily columns. Maybe this is a start.
| SELECT dep.value::float, tablename, attnames, dep.key, exprs FROM (SELECT (json_each_text(dependencies::text::json)).* AS dep, * FROM pg_stats_ext)dep WHERE dependencies IS NOT NULL ORDER BY 1 DESC ; -- AND regexp_count(key, ',') < 2
--
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Frits Jalvingh | 2022-12-23 08:12:08 | Re: Fwd: temp_file_limit? |
Previous Message | Frits Jalvingh | 2022-12-19 20:32:33 | Re: Fwd: temp_file_limit? |