Re: query_id, pg_stat_activity, extended query protocol

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, kaido vaikla <kaido(dot)vaikla(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: query_id, pg_stat_activity, extended query protocol
Date: 2024-09-11 23:07:31
Message-ID: ZuIis9-X9YwuppFL@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 11, 2024 at 05:02:07PM -0500, Sami Imseih wrote:
> In your 0003-Report-query-ID-for-execute-fetch-in-extended-query-.patch
> patch, you are still setting the queryId inside exec_execute_message
> if (execute_is_fetch). This condition could be removed and don't need to set
> the queryId inside ExecutorRun. This is exactly what v5-0001 does.
>
> V5-0001 also sets the queryId inside the exec_bind_message.
> We must do that otherwise we will have a NULL queryId during bind.
>
> also tested it against this for the case that was raised by Robert [1].

There are a few ways to do things:
- Add an extra report in ExecutorRun(), because we know that it is
going to be what we are going to cross when using a portal with
multiple execution calls. This won't work for the case of multiple
fetch messages where there is only one initial ExecutorRun() call
followed by the tuple fetches, as you say.
- Add these higher in the stack, when processing the messages. In
which case, we can also argue about removing the calls in
ExecutorRun() and ExecutorStart(), entirely, because these are
unnecessary duplicates as long as the query ID is set close to where
it is reset when we are processing the kind and execute messages.
ExecutorStart() as report location is ill-thought from the start.
- Keep all of them, relying on the first one set as the follow-up ones
are harmless. Perhaps also just reduce the number of calls on HEAD.

After sleeping on it, I'd tend to slightly favor the last option in
the back-branches and the second option on HEAD where we reduce the
number of report calls. This way, we are a bit more careful in
released branches by being more aggressive in reporting the query ID.
That's also why I have ordered the previous patch set this way but
that was badly presented, even if it does not take care of the
processing of the execute_is_fetch case for execute messages.

The tests in pg_stat_statements are one part I'm pretty sure is one
good way forward. It is not perfect, but with the psql meta-commands
we have a good deal of coverage on top of the other queries already in
the test suite. That's also the only place in core where we force the
query ID across all these hooks, and this does not impact switching
the way stats are stored if we were to switch to pgstats in shmem with
the custom stats APIs.

> I am not sure how we can add tests for RevalidateCachedQuery though using
> pg_stat_statements. We could skip testing this scenario, maybe??

Perhaps. I'd need to think through this one. Let's do things in
order and see about the reports for the bind/execute messages, first,
please?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-09-11 23:12:30 Re: Separate HEAP WAL replay logic into its own file
Previous Message Jacob Champion 2024-09-11 23:00:25 Re: libpq: Process buffered SSL read bytes to support records >8kB on async API