Quick Links

Re: Scalability in postgres

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	david(at)lang(dot)hm
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Flavio Henrique Araque Gurgel <flavio(at)4linux(dot)com(dot)br>, Fabrix <fabrixio1(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Scalability in postgres
Date:	2009-06-05 03:37:01
Message-ID:	4A2892DD.3090809@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

david(at)lang(dot)hm wrote:
> On Thu, 4 Jun 2009, Mark Mielke wrote:
>> You should really only have as 1X or 2X many threads as there are
>> CPUs waiting on one monitor. Beyond that is waste. The idle threads
>> can be pooled away, and only activated (with individual monitors
>> which can be far more easily and effectively optimized) when the
>> other threads become busy.
> sometimes the decrease in complexity in the client makes it worthwhile
> to 'brute force' things.
> this actually works well for the vast majority of services (including
> many databases)
> the question is how much complexity (if any) it adds to postgres to
> handle this condition better, and what those changes are.

Sure. Locks that are not generally contended, for example, don't deserve
the extra complexity. Locks that have any expected frequency of a
"context storm" though, probably make good candidates.

>> An alternative approach might be: 1) Idle processes not currently
>> running a transaction do not need to be consulted for their snapshot
>> (and other related expenses) - if they are idle for a period of time,
>> they "unregister" from the actively used processes list - if they
>> become active again, they "register" in the actively used process list,
> how expensive is this register/unregister process? if it's cheap
> enough do it all the time and avoid the complexity of having another
> config option to tweak.

Not really relevant if you look at the "idle for a period of time". An
active process would not unregister/register. An inactive process,
though, after it is not in a commit, and after it hits some time that is
many times more than the cost of unregister + register, would free up
other processes from having to take this process into account, allowing
for better scaling. For example, let's say it doesn't unregister itself
for 5 seconds.

>> and 2) Processes could be reusable across different connections -
>> they could stick around for a period after disconnect, and make
>> themselves available again to serve the next connection.
> depending on what criteria you have for the re-use, this could be a
> significant win (if you manage to re-use the per process cache much.
> but this is far more complex.

Does it need to be? From a naive perspective - what's the benefit of a
PostgreSQL process dying, and a new connection getting a new PostgreSQL
process? I suppose bugs in PostgreSQL don't have the opportunity to
affect later connections, but overall, this seems like an unnecessary
cost. I was thinking of either: 1) The Apache model, where a PostreSQL
process waits on accept(), or 2) When the PostgreSQL process is done, it
does connection cleanup and then it waits for a file descriptor to be
transferred to it through IPC and just starts over using it. Too hand
wavy? :-)

>> Still heavy-weight in terms of memory utilization, but cheap in terms
>> of other impacts. Without the cost of connection "pooling" in the
>> sense of requests always being indirect through a proxy of some sort.
> it would seem to me that the cost of making the extra hop through the
> external pooler would be significantly more than the overhead of idle
> processes marking themselvs as such so that they don't get consulted
> for MVCC decisions

They're separate ideas to be considered separately on the complexity vs
benefit merit.

For the first - I think we already have an "external pooler", in the
sense of the master process which forks to manage a connection, so it
already involves a possible context switch to transfer control. I guess
the question is whether or not we can do better than fork(). In
multi-threaded programs, it's definitely possible to outdo fork using
thread pools. Does the same remain true of a multi-process program that
communicates using IPC? I'm not completely sure, although I believe
Apache does achieve this by having the working processes do accept()
rather than some master process that spawns off new processes on each
connection. Apache re-uses the process.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Re: Scalability in postgres at 2009-06-05 01:04:07 from david

Responses

Re: Scalability in postgres at 2009-06-05 04:29:29 from david

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Greg Smith	2009-06-05 04:13:34	Re: Scalability in postgres
Previous Message	Robert Haas	2009-06-05 02:07:46	Re: Scalability in postgres