Re: Logging parallel worker draught

From: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Benoit Lobréau <benoit(dot)lobreau(at)dalibo(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Logging parallel worker draught
Date: 2023-10-09 15:51:34
Message-ID: 10D5C31A-A303-4440-BCA2-921440A9E1CC@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

This thread has been quiet for a while, but I'd like to share some
thoughts.

+1 to the idea of improving visibility into parallel worker saturation.
But overall, we should improve parallel processing visibility, so DBAs can
detect trends in parallel usage ( is the workload doing more parallel, or doing less )
and have enough data to either tune the workload or change parallel GUCs.

>> We can output this at the LOG level to avoid running the server at
>> DEBUG1 level. There are a few other cases where we are not able to
>> spawn the worker or process and those are logged at the LOG level. For
>> example, "could not fork autovacuum launcher process .." or "too many
>> background workers". So, not sure, if this should get a separate
>> treatment. If we fear this can happen frequently enough that it can
>> spam the LOG then a GUC may be worthwhile.

> I think we should definitely be afraid of that. I am in favor of a separate GUC.

Currently explain ( analyze ) will give you the "Workers Planned"
and "Workers launched". Logging this via auto_explain is possible, so I am
not sure we need additional GUCs or debug levels for this info.

-> Gather (cost=10430.00..10430.01 rows=2 width=8) (actual tim
e=131.826..134.325 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2

>> What I was wondering was whether we would be better off putting this
>> into the statistics collector, vs. doing it via logging. Both
>> approaches seem to have pros and cons.
>>
>> I think it could be easier for users to process the information if it
>> is available via some view, so there is a benefit in putting this into
>> the stats subsystem.

> Unless we do this instead.

Adding cumulative stats is a much better idea.

3 new columns can be added to pg_stat_database:
workers_planned,
workers_launched,
parallel_operations - There could be more than 1 operation
per query, if for example there are multiple Parallel Gather
operations in a plan.

With these columns, monitoring tools can trend if there is more
or less parallel work happening over time ( by looking at parallel
operations ) or if the workload is suffering from parallel saturation.
workers_planned/workers_launched < 1 means there is a lack
of available worker processes.

Also, We could add this information on a per query level as well
in pg_stat_statements, but this can be taken up in a seperate
discussion.

Regards,

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-10-09 16:14:39 Re: should frontend tools use syncfs() ?
Previous Message Alvaro Herrera 2023-10-09 15:48:50 Re: pg_resetwal: Corrections around -c option