Re: Fwd: temp_file_limit?

From: Frits Jalvingh <jal(at)etc(dot)to>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Fwd: temp_file_limit?
Date: 2022-12-23 08:12:08
Message-ID: CAKhTGFU4rMhDgWewvOyTQCJfJ4Esqxm3Y93TA6d1tNz9mSY3PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Justin,

As our queries are generated I decided to create a peephole optimizer kind
of thing to scan the generated SQL AST to find multiple conditions on the
same table reference. I can then use our metadata to see if these
references are expected to be correlated. This creates about 20 statistics
sets, including the one you have indicated. This at least makes the
problematic query have a stable and very fast plan (so far). I had hoped
for some more improvement with other queries but that has not yet been
evident ;)

Thanks a lot for the tips and your help!

Cordially,

Frits

On Tue, Dec 20, 2022 at 10:11 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Mon, Dec 19, 2022 at 09:10:27PM +0100, Frits Jalvingh wrote:
> > @justin
> >
> > I tried the create statistics variant and that definitely improves the
> > estimate, and with that one of the "bad" cases (the one with the 82
> minute
> > plan) now creates a good plan using only a few seconds.
> > That is a worthwhile path to follow. A bit hard to do, because those
> > conditions can be anything, but I can probably calculate the ones used
> per
> > customer and create those correlation statistics from that... It is
> > definitely better than tweaking the "poor man's query hints" enable_xxxx
> > 8-/ which is really not helping with plan stability either.
> >
> > That will be a lot of work, but I'll let you know the results ;)
>
> Yeah, if the conditions are arbitrary, then it's going to be more
> difficult. Hopefully you don't have too many columns. :)
>
> I suggest enabling autoexplain and monitoring for queries which were
> slow, and retroactively adding statistics to those columns which are
> most-commonly queried, and which have correlations (which the planner
> doesn't otherwise know about).
>
> You won't want to have more than a handful of columns in a stats object
> (since it requires factorial(N) complexity), but you can have multiple
> stats objects with different combinations of columns (and, in v14,
> expressions). You can also set a lower stats target to make the cost a
> bit lower.
>
> You could try to check which columns are correlated, either by running:
> | SELECT COUNT(1),col1,col2 FROM tbl GROUP BY 2,3 ORDER BY 1;
> for different combinations of columns.
>
> Or by creating a tentative/experimental stats object on a handful of
> columns at a time for which you have an intuition about their
> correlation, and then checking the calculated dependencies FROM
> pg_stats_ext. You may need to to something clever to use that for
> arbitrarily columns. Maybe this is a start.
> | SELECT dep.value::float, tablename, attnames, dep.key, exprs FROM
> (SELECT (json_each_text(dependencies::text::json)).* AS dep, * FROM
> pg_stats_ext)dep WHERE dependencies IS NOT NULL ORDER BY 1 DESC ; -- AND
> regexp_count(key, ',') < 2
>
> --
> Justin
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gunther Schadow 2022-12-28 15:39:14 When you really want to force a certain join type?
Previous Message Justin Pryzby 2022-12-20 21:11:09 Re: Fwd: temp_file_limit?