RE: Planning counters in pg_stat_statements (using pgss_store)

From: "imai(dot)yoshikazu(at)fujitsu(dot)com" <imai(dot)yoshikazu(at)fujitsu(dot)com>
To: 'Julien Rouhaud' <rjuju123(at)gmail(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>
Cc: legrand legrand <legrand_legrand(at)hotmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Marco Slot <marco(at)citusdata(dot)com>
Subject: RE: Planning counters in pg_stat_statements (using pgss_store)
Date: 2020-03-13 06:54:28
Message-ID: OSAPR01MB46099E7D03547A2A7292716494FA0@OSAPR01MB4609.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 12, 2020 at 6:37 PM, Julien Rouhaud wrote:
> On Thu, Mar 12, 2020 at 1:11 PM Marco Slot <marco(dot)slot(at)gmail(dot)com> wrote:
> > On Thu, Mar 12, 2020 at 11:31 AM Julien Rouhaud <rjuju123(at)gmail(dot)com>
> wrote:
> > > There's at least the current version of IVM patchset that lacks the
> > > querytext. Looking at various extensions, I see that pg_background
> > > and pglogical call pg_plan_query internally but shouldn't have any
> > > issue providing the query text. But there's also citus extension,
> > > which don't keep around the query string at least when distributing
> > > plans, which makes sense since it's of no use and they're heavily
> > > modifying the original Query. I think that citus folks opinion on
> > > the subject would be useful, so I'm Cc-ing Marco.
> >
> > Most of the time we keep our Query * data structures in a form that
> > can be deparsed back into a query string by a modified copy of
> > ruleutils.c, so we could generate a correct query string if absolutely
> > necessary, though there are performance-sensitive cases where we'd
> > rather not have the deparsing overhead.
>
> Yes, deparsing is probably too expensive for that use case.
>
> > In case of INSERT..SELECT into a distributed table, we might call
> > pg_plan_query on the SELECT part (Query *) and send the output into a
> > DestReceiver that sends tuples into shards of the distributed table
> > via COPY. The fact that SELECT does not show up in pg_stat_statements
> > separately is generally fine because it's an implementation detail,
> > and it would probably be a little confusing because the user never ran
> > the SELECT query. Moreover, the call to pg_plan_query would already be
> > reflected in the planning or execution time of the top-level query, so
> > it would be double counted if it had its own entry.
>
> Well, surprising statements can already appears in pg_stat_statements when
> you use some psql features, or if you have triggers as those will run additional
> queries under the hood.
>
> The difference here is that since citus is a CustomNode, underlying calls to
> planner will be accounted for that node, and that's indeed annoying. I can see
> that citus is doing some calls to spi_exec or
> Executor* (in ExecuteLocalTaskPlan), which could also trigger
> pg_stat_statements, but I don't know if a queryid is present there.
>
> > Another case is when some of the shards turn out to be local to the
> > server that handles the distributed query. In that case we plan the
> > queries on those shards via pg_plan_query instead of deparsing and
> > sending the query string to a remote server. It would be less
> > confusing for these queries to show in pg_stat_statements, because the
> > queries on the shards on remote servers will show up as well. However,
> > this is a performance-sensitive code path where we'd rather avoid
> > deparsing.
>
> Agreed.
>
> > In general, I'd prefer if there was no requirement to pass a correct
> > query string. I'm ok with passing "SELECT 'citus_internal'" or just ""
> > if that does not lead to issues. Passing NULL to signal that the
> > planner call should not be tracked separately does seem a bit cleaner.
>
> That's very interesting feedback, thanks! I'm not a fan of giving a way for
> queries to say that they want to be ignored by pg_stat_statements, but double
> counting the planning counters seem even worse, so I'm +0.5 to accept NULL
> query string in the planner, incidentally making pgss ignore those.

It is preferable that we can count various queries statistics as much as possible
but if it causes overhead even when without using pg_stat_statements, we would
not have to force them to set appropriate query_text.
About settings a fixed string in query_text, I think it doesn't make much sense
because users can't take any actions by seeing those queries' stats. Moreover, if
we set a fixed string in query_text to avoid pg_stat_statement's errors, codes
would be inexplicable for other developers who don't know there's such
requirements.
After all, I agree accepting NULL query string in the planner.

I don't know it is useful but there are also codes that avoid an error when
sourceText is NULL.

executor_errposition(EState *estate, int location)
{
...
/* Can't do anything if source text is not available */
if (estate == NULL || estate->es_sourceText == NULL)
}

--
Yoshikazu Imai

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-13 07:04:50 Re: [PATCH] Skip llvm bytecode generation if LLVM is missing
Previous Message Craig Ringer 2020-03-13 06:39:43 Re: logical replication empty transactions