Re: compute_query_id and pg_stat_statements

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: magnus(at)hagander(dot)net, masao(dot)fujii(at)oss(dot)nttdata(dot)com, michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)alvh(dot)no-ip(dot)org, sfrost(at)snowman(dot)net, bruce(at)momjian(dot)us, myon(at)debian(dot)org, peter(dot)eisentraut(at)enterprisedb(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: compute_query_id and pg_stat_statements
Date: 2021-05-12 06:05:16
Message-ID: 20210512060516.g2ebxgvbiglkgfv7@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 12, 2021 at 02:33:35PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote in
> >
> > I don't think that this approach would cope well for people who want a queryid
> > without pg_stat_statements or such. Since the queryid can now be found in
> > pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
> > to allow users to benefit from that even if they don't install additional
> > module.
>
> Ah, I missed that case. And we are wanting to use pg_stat_statements
> with (almost) zero-config? How about the following behavior?
>
> Setting query_id_provider to 'none' means we don't calculate query-id
> by default. However, if queryIdWante() is called, the default provider
> is set up and starts calculating query id.

Having "none" meant "not unless someone asks for it" looks like a POLA
violation.

> Setting query_id_provider to something else means the user wants
> query-id calcualted using the provider. Setting 'default' is
> equivalent to setting compute_query_id to 'on'.
>
> There might be a case where a user sets query_id_provider to
> non-'none' but don't want have query-id calculated, but it can be said
> a kind of mis-configuration?

So if I'm understanding correctly, you're arguing for an approach different to
what Michael stated as the general consensus in [1]. I'm not saying that I
think it's a bad idea (and I actually suggested it before), but we have to
chose an approach and stick with it.

> > I think this would be a mistake to do that, as it would mean that we don't
> > officially support alternative queryid provider.
>
> Ok, if we want to support alternative providers from the first, we
> need to actually write the loader code for query-id providers. It
> would not be so hard?, but it might not be suitable to this stage so I
> proposed that to get rid of needing such complexity for now.

I did write a POC extension [2] to demonstrate that moving pg_stat_statement's
queryid calculation in core doesn't mean that we're imposing it to everyone.
And yes this is critical and a must have in the initial implementation.

> (Anyway I prefer to load query-id provider as a dynamically loadable
> module rather than hook-function.)

I agree that having a specific API (I'm fine with a hook or a dynamically
loaded function) for that would be better, but it doesn't appear to be the
opinion of the majority.

> > The GUC itself may not change, but third-party queryid provider would probably
> > need changes as the new entry point will be dedicated to compute a queryid
> > only, while third-party plugins may do more than that in their
> > post_parse_analyze_hook. And also users will have to change their
>
> I don't think it is not that a problem.

Did you mean "I don't think that it's a problem"? Otherwise I don't get it.

> Even if any third-party
> extension is having query-id generator by itself, in most cases it
> would be a copy of JumbleQuery in case of pg_stat_statement is not
> loaded and now it is moved in-core as 'default' provider. What the
> exntension needs to be done is just ripping out the copied generator
> code. I guess...

I don't fully understand, but it seems that you're arguing that the only use
case is to have something similar to pg_stat_statements (say e.g.
pg_store_plans), that always have the same queryid implementation as
pg_stat_statements. That's not the case, as there already are "clones" of
pg_stat_statements, and the main difference is an alternative queryid
implementation. So in that case what author would do is to drop everything
*except* the queryid implementation.

And if I'm not mistaken, pg_store_plans also wants a different queryid
implementation, but has to handle a secondary queryid on top of it
(https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855)
So here again what the extension want is to get rid of pg_stat_statements (and
now core) queryid implementation.

> > configuration to use that new interface, and additionally the module may now
> > have to be removed from shared_preload_libraries. Overall, it doesn't seem to
> > me that it would make users' life easier.
>
> Why the third-party module need to be removed from
> shared_preload_libraries? The module can stay as a preloaded shared
> library but just no longer need to have its own query-id provider
> since it is provided in-core. If the extension required a specific
> provider, the developer need to make it a loadable module and users
> need to specify the provider module explicitly.

It's the same misunderstanding here. Basically people want to benefit from the
whole ecosystem based on a queryid (pg_stat_statements, now
pg_stat_activity.query_id and such) but with another definition of what a
queryid is. So those people will now only need to implement something like
[2], rather than forking every single extension they want to use.

[1]: https://www.postgresql.org/message-id/YJoeXcrwe1EDmqKT@paquier.xyz
[2]: https://github.com/rjuju/pg_queryid

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-05-12 06:10:49 Re: compute_query_id and pg_stat_statements
Previous Message Pavel Stehule 2021-05-12 05:49:13 Re: compute_query_id and pg_stat_statements