Re: Making auto_explain more useful / convenient

From: Vladimir Churyukin <vladimir(at)churyukin(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Making auto_explain more useful / convenient
Date: 2023-11-11 17:03:48
Message-ID: CAFSGpE2Vo5iB1Fiauazy_t4Em8ToBeWJRLqqWdk0wv7FOv86fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 11, 2023 at 7:49 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Vladimir Churyukin <vladimir(at)churyukin(dot)com> writes:
> > Why not have an option to return EXPLAIN results as a NoticeResponse
> > instead? That would make its usage more convenient.
>
> That seems quite useless to me, and likely actually counterproductive.
> If you are manually investigating query performance, you can just use
> EXPLAIN directly. The point of auto_explain, ISTM, is to capture info
> about queries issued by automated applications. So something like the
> above could only work if you taught every one of your applications to
> capture the NOTICE output, separate it from random other NOTICE
> output, and then (probably) log it somewhere central for later
> inspection. That's a lot of code to write, and at the end you'd
> only have effectively duplicated existing tooling such as pgbadger.
> Also, what happens in applications you forgot to convert?
>
>
Sergey Kornilov just gave the right answer above in the thread for this one.
Unfortunately, there are a lot of scenarios where you can't use pgbadger or
any other log analysis or it's not convenient.
There are a bunch of cloud hosted forks of postgres for example, not all of
them give you this functionality.
In AWS for example you need to download all the logs first, which
complicates it significantly.
The goal of this is not investigating performance of a single query but
rather constant monitoring of a bunch (or all) queries, so you can detect
plan degradations right away.

> > Another thing is tangentially related...
> > I think it may be good to have a number of options to generate
> > significantly shorter output similar to EXPLAIN. EXPLAIN is great, but
> > sometimes people need more concise and specific information, for example
> > total number of buffers and reads by certain query (this is pretty
> common),
> > whether or not we had certain nodes in the plan (seq scan, scan of
> certain
> > index(es)), how bad was cardinality misprediction on certain nodes, etc.
>
> Maybe, but again I'm a bit skeptical. IME you frequently don't know
> what you're looking for until you've seen the bigger picture. Zeroing
> in on details like this could be pretty misleading.
>
>
If you don't know what you're looking for, then it's not very useful, I
agree.
But in many cases you know. There are certain generic "signs of trouble"
that you can detect by
the amount of data the query processor scans, by cache hit rate for certain
queries. presence of seq scans or scans of certain indexes,
large differences between predicted and actual rows, some other stuff that
may be relevant to your app/queries specifically that you want to monitor.
We're already doing similar analysis on our side (a multi-terabyte db
cluster with hundreds of millions to billions queries running daily).
But it's not efficient enough because:
1. the problem I mentioned above, access to logs is limited on cloud
environments
2. explain output could be huge, it causes performance issues because of
its size. compact output is much more preferable for mass processing
(it's even more important if this output is to notice messages rather than
to logs, that's why I said it's tangentially related)

Since it seems the notice output is already possible, half of the problem
is solved already.
I'll try to come up with possible options for more compact output
then, unless you think it's completely futile.

thank you,
-Vladimir Churyukin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-11-11 18:17:02 Re: pgsql: Don't trust unvalidated xl_tot_len.
Previous Message Alexander Lakhin 2023-11-11 17:00:01 Re: pg_basebackup check vs Windows file path limits