a pool for parallel worker

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: a pool for parallel worker
Date: 2025-03-11 12:38:38
Message-ID: 87h63zg2sx.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Currently when a query needs some parallel workers, postmaster spawns
some backend for this query and when the work is done, the backend
exit. there are some wastage here, e.g. syscache, relcache, smgr cache,
vfd cache and fork/exit syscall itself.

I am thinking if we should preallocate (or create lazily) some backends
as a pool for parallel worker. The benefits includes:

(1) Make the startup cost of a parallel worker lower in fact.
(2) Make the core most suitable for the cases where executor need to a
new worker to run a piece of plan more. I think this is needed in some
data redistribution related executor in a distributed database.

I guess the both cases can share some well designed code, like costing or
transfer the data between worker and leader.

The boring thing for the pool is it is [dbid + userId] based, which
I mean if the dbid or userId is different with the connection in pool,
they can't be reused. To reduce the effect of UserId, I think if we can
start the pool with a superuser and then switch the user information
with 'SET ROLE xxx'. and the pool can be created lazily.

Any comments on this idea?

--
Best Regards
Andy Fan

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-03-11 12:51:07 Re: Parallel heap vacuum
Previous Message BharatDB 2025-03-11 12:37:19 Fwd: Test mail for pgsql-hackers