From: | Andy Fan <zhihuifan1213(at)163(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | a pool for parallel worker |
Date: | 2025-03-11 12:38:38 |
Message-ID: | 87h63zg2sx.fsf@163.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Currently when a query needs some parallel workers, postmaster spawns
some backend for this query and when the work is done, the backend
exit. there are some wastage here, e.g. syscache, relcache, smgr cache,
vfd cache and fork/exit syscall itself.
I am thinking if we should preallocate (or create lazily) some backends
as a pool for parallel worker. The benefits includes:
(1) Make the startup cost of a parallel worker lower in fact.
(2) Make the core most suitable for the cases where executor need to a
new worker to run a piece of plan more. I think this is needed in some
data redistribution related executor in a distributed database.
I guess the both cases can share some well designed code, like costing or
transfer the data between worker and leader.
The boring thing for the pool is it is [dbid + userId] based, which
I mean if the dbid or userId is different with the connection in pool,
they can't be reused. To reduce the effect of UserId, I think if we can
start the pool with a superuser and then switch the user information
with 'SET ROLE xxx'. and the pool can be created lazily.
Any comments on this idea?
--
Best Regards
Andy Fan
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-03-11 12:51:07 | Re: Parallel heap vacuum |
Previous Message | BharatDB | 2025-03-11 12:37:19 | Fwd: Test mail for pgsql-hackers |