Re: Find query characters in respect of optimizer for develop purpose

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Find query characters in respect of optimizer for develop purpose
Date: 2020-06-11 21:29:15
Message-ID: CAAKRu_Y64GmVLxjs0RTaUVLz7m5+D=6h=yM4Rw9GhiLhSNbkKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 18, 2020 at 1:30 AM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:

> Hello:
>
> Before I want to pay attention to some optimizer features, I want to
> estimate how much benefits it can create for customers, at least for our
> current
> running customer. So I want to have some basic idea what kind of the query
> is
> running now in respect of optimizer.
>
>
You are imagining this to be collected during planning on a live
customer system as a form of telemetry?
I was inspired to search the hackers mailing list archive for the word
"telemetry" and didn't get many hits, which surprised me.

>
> My basic is we can track it with the below struct(every backend has one
> global
> variable to record it).
>
> +typedef struct
> +{
> + int subplan_count;
> + int subquery_count;
> + int join_count;
> + bool hasagg;
> + bool hasgroup;
> +} QueryCharacters;
>
> it will be reset at the beginning of standard_planner, and the values are
> increased at make_subplan, set_subquery_pathlist, make_one_rel,
> create_grouping_paths. later it can be tracked and viewed in
> pg_stat_statements.
>
>
I think the natural reaction to this idea is: isn't there a 3rd party
tool that does this? Or can't you use one of the hooks and write an
extension, to, for example, examine the parse,query,and plan trees?

However, it does seem like keeping track of this information would be
much easier during planning since planner will be examining the query
tree and making the plan anyway.

On the other hand, I think that depends a lot on what specific
information you want to collect. Out of the fields you listed, it is
unclear what some of them would mean.
Does join_count count the number of explicit joins in the original query
or does it count the number of joins in the final plan? Does
subquery_count count all sub-selects in the original query or does it
only count subqueries that become SubqueryScans or SubPlans? What about
subqueries that become InitPlans?

One concern I have is that it seems like this struct would have to be
updated throughout planning and that it would be easy to break it with
the addition of new code. Couldn't every new optimization added to
planner potentially affect the accuracy of the information in the
struct?

--
Melanie Plageman

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-11 21:44:32 Re: Recording test runtimes with the buildfarm
Previous Message Andrew Dunstan 2020-06-11 21:26:49 Re: Recording test runtimes with the buildfarm