Re: Built-in connection pooling

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Built-in connection pooling
Date: 2018-01-19 16:17:02
Message-ID: 89fe7d3f-796d-00cb-c00b-c669be94831a@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19.01.2018 18:53, Tomas Vondra wrote:
>
> On 01/19/2018 10:52 AM, Konstantin Knizhnik wrote:
>>
>> On 18.01.2018 18:02, Tomas Vondra wrote:
>>> Hi Konstantin,
>>>
>>> On 01/18/2018 03:48 PM, Konstantin Knizhnik wrote:
>>>> On 17.01.2018 19:09, Konstantin Knizhnik wrote:
>>>>> Hi hackers,
>>>>>
>>>>> ...
>>> I haven't looked at the code yet, but after reading your message I have
>>> a simple question - gow iss this going to work with SSL? If you're only
>>> passing a file descriptor, that does not seem to be sufficient for the
>>> backends to do crypto (that requires the SSL stuff from Port).
>>>
>>> Maybe I'm missing something and it already works, though ...
>>>
>>>
>>> regards
>>>
>> Ooops, I missed this aspect with SSL. Thank you.
>> New version of the patch which correctly maintain session context is
>> attached.
>> Now each session has its own allocator which should be used instead
>> of TopMemoryAllocator. SSL connections work now.
>>
> OK. I've looked at the code, but I have a rather hard time understanding
> it, because there are pretty much no comments explaining the intent of
> the added code :-( I strongly suggest improving that, to help reviewers.
Sorry, sorry, sorry...
There are some comments and I will add more.

>
> The questions I'm asking myself are mostly these:
>
> 1) When assigning a backend, we first try to get one from a pool, which
> happens right at the beginning of BackendStartup. If we find a usable
> backend, we send the info to the backend (pg_send_sock/pg_recv_sock).
>
> But AFAICS this only only happens at connection time, right? But it your
> initial message you say "Rescheduling is done at transaction level,"
> which in my understanding means "transaction pooling". So, how does that
> part work?

Here it is:

              ChooseSession:
                DoingCommandRead = true;
                /* Select which client session is ready to send new
query */
                if (WaitEventSetWait(SessionPool, -1, &ready_client, 1,
PG_WAIT_CLIENT) != 1)
                ...
                if (ready_client.fd == SessionPoolSock)
               {
                    /* Here we handle case of attaching new session */
                    ...
                }
                else /* and here we handle case when there is query
(new transaction) from some client */
                {
                    elog(DEBUG2, "Switch to session %d in backend %d",
ready_client.fd, MyProcPid);
                    CurrentSession =
(SessionContext*)ready_client.user_data;
                    MyProcPort = CurrentSession->port;
                }

>
> 2) How does this deal with backends for different databases? I don't see
> any checks that the requested database matches the backend database (not
> any code switching the backend from one db to another - which would be
> very tricky, I think).
As I wrote in the initial mail this problem is not handled now.
It is expected that all clients are connected to the same database using
the same user.
I only check and report an error if this assumption is violated.
Definitely it should be fixed. And it is one of the main challenge with
this approach! And I want to receive some advices from community about
the best ways of solving it.
The problem is that we get information about database/user in
ProcessStartupPackage function in the beackend, when session is already
assigned to the particular backend.
We either have to somehow redirect session to some other backend
(somehow notify postmaster that we are not able to handle it)?
either obtain database/user name in postmaster. But it meas that
ProcessStartupPackage should be called in postmaster and Postmaster has
to read from client's socket.
I afraid that postmaster can be a bottleneck in this case.

The problem can be much easily solved in case of using pthread version
of Postgres. In this case reassigning session to another executor
(thread) can be don much easily.
And there is no need to use unportable trick with passing fiel
descriptor to other process.
And in future I am going to combine them. The problem is that pthread
version of Postgres is still in very raw state.

> 3) Is there any sort of shrinking the pools? I mean, if the backend is
> idle for certain period of time (or when we need backends for other
> databases), does it get closed automatically?

When client is disconnected, client session is closed. But backen is not
terminated even if there are no more sessions at this backend.
It  was done intentionally, to avoid permanent spawning of new processes
when there is one or few clients which frequently connect/disconnect to
the database.
>
>
> Furthermore, I'm rather confused about the meaning of session_pool_size.
> I mean, that GUC determines the number of backends in the pool, it has
> nothing to do with sessions per se, right? Which would mean it's a bit
> misleading to name it "session_..." (particularly if the pooling happens
> at transaction level, not session level - which is question #1).
>
Yehh, yes it is not right name. It means maximal number of backends
which should be used to serve client's sessions.
But "max backends" is already used and has completely different meaning.

> When I've been thinking about adding a built-in connection pool, my
> rough plan was mostly "bgworker doing something like pgbouncer" (that
> is, listening on a separate port and proxying everything to regular
> backends). Obviously, that has pros and cons, and probably would not
> work serve the threading use case well.

And we will get the same problem as with pgbouncer: one process will not
be able to handle all connections...
Certainly it is possible to start several such scheduling bgworkers...
But in any case it is more efficient to multiplex session in backend
themselves.

> But it would have some features that I find valuable - for example, it's
> trivial to decide which connection requests may or may not be served
> from a pool (by connection to the main port or pool port).
>
> That is not to say the bgworker approach is better than what you're
> proposing, but I wonder if that would be possible with your approach.
>
>
> regards
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-01-19 16:27:44 Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions
Previous Message Tomas Vondra 2018-01-19 15:53:22 Re: Built-in connection pooling