Re: pgsql: Move pg_stat_statements query jumbling to core.

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Move pg_stat_statements query jumbling to core.
Date: 2021-04-08 10:56:09
Message-ID: 20210408105609.ducxkeqidufygwqr@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On Thu, Apr 08, 2021 at 03:52:02PM +0530, Amit Kapila wrote:
> On Thu, Apr 8, 2021 at 3:11 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > On Thu, Apr 08, 2021 at 02:52:26PM +0530, Amit Kapila wrote:
> > >
> > > Yeah, we do pass instrumentation flags from leader to worker but not
> > > the one allocated via pg_stat_staments (pgss_ExecutorStart {...
> > > queryDesc->totaltime ...}.
> >
> > Ah indeed, I now remember that problem that I had to deal with in some
> > extensions [1]. For the record this was initially raised by Tomas in [2], but
> > unfortunately that thread got forgotten and probably all third-party extensions
> > ended up allocating some shmem for their need.
> >
> > > However, I guess prior to this patch those
> > > were anyway not allocated separately because queryid was not passed to
> > > workers. So, I think you are right it shouldn't change the previous
> > > behavior.
> >
> > I think that previously parallel workers always added the required
> > instrumentation if pg_stat_statements is loaded,
> >
>
> How? AFAICS, in pgss_ExecutorStart(), we allocate the instrumentation
> structure only if queryid is non-zero which prior to this patch
> shouldn't happen.

Doh right, I think I need some sleep :)

So I tried to run that scenario against pg13 and master + the ignore parallel
worker patch:

CREATE TABLE pgss AS SELECT generate_series(1, 10000) id;

ALTER TABLE pgss SET (max_parallel_workers = 10);
SET force_parallel_mode = on;
SET parallel_leader_participation = off;
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;

SELECT version();
SELECT pg_prewarm('pgss');

EXPLAIN (ANALYZE, VERBOSE, BUFFERS, COSTS OFF) SELECT COUNT(*) FROM pgss;
SELECT pg_stat_statements_reset();
SELECT COUNT(*) FROM pgss;
SELECT query, shared_blks_hit FROM pg_stat_statements WHERE query LIKE '%pgss%';

On pg13, I get:

pg_prewarm
------------
45
(1 row)

version
---------------------------------------------------------------------------------------------------------
PostgreSQL 13(dot)2(at)49076fd3ba on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 10.2.0-r5 p6) 10.2.0, 64-bit
(1 row)

QUERY PLAN
-----------------------------------------------------------------------------------
Gather (actual time=8.124..9.004 rows=1 loops=1)
Output: (count(*))
Workers Planned: 1
Workers Launched: 1
Single Copy: true
Buffers: shared hit=45
-> Aggregate (actual time=3.649..3.650 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
Worker 0: actual time=3.649..3.650 rows=1 loops=1
Buffers: shared hit=45
-> Seq Scan on public.pgss (actual time=0.057..2.331 rows=10000 loops=1)
Output: id
Buffers: shared hit=45
Worker 0: actual time=0.057..2.331 rows=10000 loops=1
Buffers: shared hit=45
Planning:
Buffers: shared hit=15 read=5
Planning Time: 0.211 ms
Execution Time: 9.051 ms
(20 rows)

pg_stat_statements_reset
--------------------------

(1 row)

count
-------
10000
(1 row)

query | shared_blks_hit
---------------------------+-----------------
SELECT COUNT(*) FROM pgss | 45
(1 row)

On head + patch:

pg_prewarm
------------
45
(1 row)

------------------------------------------------------------------------------------------------------>
PostgreSQL 14devel-O0(at)45885bee58 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 10.2.0-r5 p6) 10.2.0,>
(1 row)

QUERY PLAN
------------------------------------------------------------------------------------
Gather (actual time=39.278..42.912 rows=1 loops=1)
Output: (count(*))
Workers Planned: 1
Workers Launched: 1
Single Copy: true
Buffers: shared hit=45
-> Aggregate (actual time=20.251..20.254 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
Worker 0: actual time=20.251..20.254 rows=1 loops=1
Buffers: shared hit=45
-> Seq Scan on public.pgss (actual time=0.214..13.746 rows=10000 loops=1)
Output: id
Buffers: shared hit=45
Worker 0: actual time=0.214..13.746 rows=10000 loops=1
Buffers: shared hit=45
Query Identifier: 5605505304133078562
Planning:
Buffers: shared hit=21
Planning Time: 0.857 ms
Execution Time: 43.237 ms
(21 rows)

pg_stat_statements_reset
--------------------------

(1 row)

count
-------
10000
(1 row)

query | shared_blks_hit
---------------------------+-----------------
SELECT COUNT(*) FROM pgss | 45
(1 row)

And I get the exact same results if I change the scenario to SET
force_parallel_mode = off. So it seems that it was working as expected as I
can see the 45 buffers being access, and is still working as expected, although
I'm not exactly sure why right now if queryDesc->totaltime isn't somehow
gathered by the parallel worker instrastructure. (Note that 45885bee58 is my
local commit for "Ignore parallel workers in pg_stat_statements").

Also note that the buffers shown in the planning part are due to the catcache
population, so they're not expected on the 2nd query planning.

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2021-04-08 11:23:10 pgsql: Add circular WAL decoding buffer.
Previous Message David Rowley 2021-04-08 10:36:10 pgsql: Cleanup partition pruning step generation