Quick Links

Re: compute_query_id and pg_stat_statements

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	rjuju123(at)gmail(dot)com
Cc:	magnus(at)hagander(dot)net, masao(dot)fujii(at)oss(dot)nttdata(dot)com, michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)alvh(dot)no-ip(dot)org, sfrost(at)snowman(dot)net, bruce(at)momjian(dot)us, myon(at)debian(dot)org, peter(dot)eisentraut(at)enterprisedb(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: compute_query_id and pg_stat_statements
Date:	2021-05-12 05:33:35
Message-ID:	20210512.143335.1053382893089496881.horikyota.ntt@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote in
> Hello Horiguchi-san,
>
> On Wed, May 12, 2021 at 11:08:36AM +0900, Kyotaro Horiguchi wrote:
> >
> > If we look it in pg_settings, it shows the current value and the value
> > at boot-time. So I'm fine with that behavior.
> >
> > However, IMHO, I doubt the necessity of "on". Assuming that we require
> > any module that wants query-id to call queryIdWanted() at any time
> > after each process startup (or it could be inherited to children), I
> > think only "auto" and "off" are enough for the variable.
>
> I don't think that this approach would cope well for people who want a queryid
> without pg_stat_statements or such. Since the queryid can now be found in
> pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
> to allow users to benefit from that even if they don't install additional
> module.

Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?

Setting query_id_provider to 'none' means we don't calculate query-id
by default. However, if queryIdWante() is called, the default provider
is set up and starts calculating query id.

Setting query_id_provider to something else means the user wants
query-id calcualted using the provider. Setting 'default' is
equivalent to setting compute_query_id to 'on'.

There might be a case where a user sets query_id_provider to
non-'none' but don't want have query-id calculated, but it can be said
a kind of mis-configuration?

> > Thinking in
> > this line, the variable is a subset of a GUC variable to specify the
> > name of a query-id provider (as Andres suggested upthread), and I
> > think it would work better in future.
> >
> > So for now I propose that we have a variable query_id_provider that
> > has only 'default' and 'none' as the domain.
>
> I think this would be a mistake to do that, as it would mean that we don't
> officially support alternative queryid provider.

Ok, if we want to support alternative providers from the first, we
need to actually write the loader code for query-id providers. It
would not be so hard?, but it might not be suitable to this stage so I
proposed that to get rid of needing such complexity for now.

(Anyway I prefer to load query-id provider as a dynamically loadable
module rather than hook-function.)

> > We can later expand it
> > so that any other query-id provider modules can be loaded without
> > chaning the interface.
>
> The GUC itself may not change, but third-party queryid provider would probably
> need changes as the new entry point will be dedicated to compute a queryid
> only, while third-party plugins may do more than that in their
> post_parse_analyze_hook. And also users will have to change their

I don't think it is not that a problem. Even if any third-party
extension is having query-id generator by itself, in most cases it
would be a copy of JumbleQuery in case of pg_stat_statement is not
loaded and now it is moved in-core as 'default' provider. What the
exntension needs to be done is just ripping out the copied generator
code. I guess...

> configuration to use that new interface, and additionally the module may now
> have to be removed from shared_preload_libraries. Overall, it doesn't seem to
> me that it would make users' life easier.

Why the third-party module need to be removed from
shared_preload_libraries? The module can stay as a preloaded shared
library but just no longer need to have its own query-id provider
since it is provided in-core. If the extension required a specific
provider, the developer need to make it a loadable module and users
need to specify the provider module explicitly. I don't think that is
not a problem but if we wanted to make it easier, we can let users
free from that step by allowing 'auto' for query-id-provider to load
any module by the first extension.

So, for example, how about the following interface?

GUC query_id_provider:

- 'none' : query_id is not calculated, don't allow loading external
generator module.

- 'default' : use default provider and calculate query-id.

- '<provider-name>' : use the provider and calculate query-id using it.

- 'auto' : query_id is not calculated, but allow to load query-id
provider if queryIdWanted() is called.

# of course 'auto' and 'default' are inhibited as the provier name.

- core function bool queryIdWanted(char *provider_name, bool use_existing)

Allows extensions to request to load a provider if not yet, then
start calculating query-id. Returns true if the request is accepted.

provider_name :

- 'default' or '<provider-name>': requests the provider to be loaded
and start calculating query-id. Refuse the request if 'none' is
set to query_id_provider.

use_existing: Set true to allow using a provider already loaded.
Otherwise refuses the request if any other provider than
prvoder_name is already loaded.

In most cases users set query_id_provider to 'auto' and extensions
call queryIdWanted with ('default', true).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: compute_query_id and pg_stat_statements at 2021-05-12 02:42:01 from Julien Rouhaud

Responses

Re: compute_query_id and pg_stat_statements at 2021-05-12 05:44:47 from Kyotaro Horiguchi
Re: compute_query_id and pg_stat_statements at 2021-05-12 05:49:13 from Pavel Stehule
Re: compute_query_id and pg_stat_statements at 2021-05-12 06:05:16 from Julien Rouhaud

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2021-05-12 05:37:14	Re: Schema variables - new implementation for Postgres 15
Previous Message	Dilip Kumar	2021-05-12 05:26:38	Re: parallel vacuum - few questions on docs, comments and code