Re: O(n) tasks cause lengthy startups and checkpoints

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Date: 2021-12-13 13:53:37
Message-ID: CA+Tgmoag+stJPU9Vxoyaq6gzZGPLQ9C+edZcKb4PgW-yTB8LaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 10, 2021 at 2:03 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> Well, I haven't had a chance to look at your patch, and my patch set
> still only has handling for CheckPointSnapBuild() and
> RemovePgTempFiles(), but I thought I'd share what I have anyway. I
> split it into 5 patches:
>
> 0001 - Adds a new "custodian" auxiliary process that does nothing.
> 0002 - During startup, remove the pgsql_tmp directories instead of
> only clearing the contents.
> 0003 - Split temporary file cleanup during startup into two stages.
> The first renames the directories, and the second clears them.
> 0004 - Moves the second stage from 0003 to the custodian process.
> 0005 - Moves CheckPointSnapBuild() to the custodian process.

I don't know whether this kind of idea is good or not.

One thing we've seen a number of times now is that entrusting the same
process with multiple responsibilities often ends poorly. Sometimes
it's busy with one thing when another thing really needs to be done
RIGHT NOW. Perhaps that won't be an issue here since all of these
things are related to checkpointing, but then the process name should
reflect that rather than making it sound like we can just keep piling
more responsibilities onto this process indefinitely. At some point
that seems bound to become an issue.

Another issue is that we don't want to increase the number of
processes without bound. Processes use memory and CPU resources and if
we run too many of them it becomes a burden on the system. Low-end
systems may not have too many resources in total, and high-end systems
can struggle to fit demanding workloads within the resources that they
have. Maybe it would be cheaper to do more things at once if we were
using threads rather than processes, but that day still seems fairly
far off.

But against all that, if these tasks are slowing down checkpoints and
that's avoidable, that seems pretty important too. Interestingly, I
can't say that I've ever seen any of these things be a problem for
checkpoint or startup speed. I wonder why you've had a different
experience.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-12-13 13:53:48 Re: extended stats on partitioned tables
Previous Message Justin Pryzby 2021-12-13 13:48:30 Re: extended stats on partitioned tables