Re: proposal - log_full_scan

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: proposal - log_full_scan
Date: 2021-07-06 14:07:33
Message-ID: 3E8BD81C-07C9-46F2-98D3-2E4EEEFB82FD@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Looking at this I like the idea in principle, but I'm not convinced that
auto_explain is the right tool for this. auto_explain is for identifying slow
queries, and what you are proposing is to identify queries with a certain
"shape" (for lack of a better term) even if they aren't slow as per the
log_min_duration setting. If log_min_duration is deemed to crude due to query
volume then sample_rate is the tool. If sample_rate is also discarded, then
pg_stat_statements seems a better option.

Also, why just sequential scans (apart from it being this specific usecase)?
If the idea is to track aspects of execution which are deemed slow, then
tracking for example spills etc would be just as valid. Do you have thoughts
on that?

That being said, a few comments on the patch:

- (auto_explain_log_min_duration >= 0 && \
+ ((auto_explain_log_min_duration >= 0 || auto_explain_log_seqscan != -1) && \
Is there a reason to not follow the existing code and check for >= 0?

+ DefineCustomIntVariable("auto_explain.log_seqscan",
It's only a matter of time before another node is proposed for logging, and
then we'll be stuck adding log_XXXnode GUCs. Is there a more future-proof way
to do this?

+ "Sets the minimum tuples produced by sequantial scans which plans will be logged",
s/sequantial/sequential/

- es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
+ es->analyze = (queryDesc->instrument_options && (auto_explain_log_analyze || auto_explain_log_seqscan != -1));
Turning on ANALYZE when log_analyze isn't set to True is a no-no IMO.

+ * Colllect relations where log_seqscan limit was exceeded
s/Colllect/Collect/

+ if (*relnames.data != '\0')
+ appendStringInfoString(&relnames, ",");
This should use appendStringInfoChar instead.

+ (errmsg("duration: %.3f ms, over limit seqscans: %s, plan:\n%s",
The "over limit" part is superfluous since it otherwise wouldn't be logged. If
we're prefixing something wouldn't it be more helpful to include the limit,
like: "seqscans >= %d tuples returned:". I'm not a fan of "seqscans" but
spelling it out is also quite verbose and this is grep-able.

Documentation and tests are also missing

--
Daniel Gustafsson https://vmware.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dipesh Pandit 2021-07-06 14:38:42 Re: .ready and .done files considered harmful
Previous Message Dilip Kumar 2021-07-06 13:39:38 Re: [PATCH] Use optimized single-datum tuplesort in ExecSort