Re: Shared memory

From: Thomas Hallgren <thomas(at)tada(dot)se>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: PL/Java Development <Pljava-dev(at)gborg(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Shared memory
Date: 2006-03-27 12:48:23
Message-ID: 4427DF17.7040700@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pljava-dev

Martijn van Oosterhout wrote:
> On Mon, Mar 27, 2006 at 10:57:21AM +0200, Thomas Hallgren wrote:
>> Martijn,
>>
>> I tried a Socket approach. Using the new IO stuff that arrived with Java
>> 1.4 (SocketChannel etc.), the performance is really good. Especially on
>> Linux where an SMP machine show a 1 to 1.5 ratio between one process doing
>> ping-pong between two threads and two processes doing ping-pong using a
>> socket. That's acceptable overhead indeed and I don't think I'll be able to
>> trim it much using a shared memory approach (the thread scenario uses Java
>> monitor locks. That's the most efficient lightweight locking implementation
>> I've come across).
>
> Yeah, it's fairly well known that the distinctions between processes
> and threads on linux is much smaller than on other OSes. Windows is
> pretty bad, which is why threading is much more popular there.
>
>> The real downside is that a call from SQL to PL/Java using the current
>> in-process approach is really fast. It takes about 5 micro secs on my
>> 2.8GHz i386 box. The overhead of an IPC-call on that box is about 18 micro
>> secs on Linux and 64 micro secs on Windows. That's an overhead of between
>> 440% and 1300% due to context switching alone. Yet, for some applications,
>
> <snip>
>
> This might take some more measurements but AIUI the main difference
> between in-process and intra-process is that one has a JVM per
> connection, the other one JVM shared. In that case might thoughts are
> as follows:
>
> - Overhead of starting JVM. If you can start the JVM in the postmaster
> you might be able to avoid this. However, if you have to restart the
> JVM each process, that's a cost.
>
> - JIT overhead. For often used classes JIT compiling can help a lot
> with speed. But if every class needs to be reinterpreted each time,
> maybe that costs more than your IPC.
>
> - Memory overhead. You meantioned this already.
>
> - Are you optimising for many short-lived connections or a few
> long-lived connections?
>
> My gut feeling is that if someone creates a huge number of server-side
> java functions that performence will be better by having one always
> running JVM with highly JIT optimised code than having each JVM doing
> it from scratch. But this will obviously need to be tested.
>
The use case with a huge number of short-lived connections is not feasible at all with
PL/Java as it stands today. This is partly the reason for my current research. Another
reason is that it's sometimes desirable to share resources between your connections.
Dangerous perhaps, but an API that encourages separation and allows sharing in a controlled
way might prove very beneficial.

The ideal use-case for PL/Java is a client that utilizes a connection pool. And most servlet
containers and EJB servers do. Scenarios where you have just a few and fairly long lived
clients are OK too.

> One other thing is that seperate processes give you the ability to
> parallelize. For example, if a Java function does an SPI query, it can
> receive and process results in parallel with the backend generating
> them. This may not be easy to acheive with an in-process JVM.
>

It is fairly easy to achieve using threads. Only one thread at a time may of course execute
an SPI query but that's true when multiple processes are in place too since the backend is
single-threaded, and since the logical thread in PL/Java must utilize the same backend as
where the call originated (to maintain the transaction boundaries). Any result must also
sooner or later be delivered using that same backend which further limits the ability to
parallelize.

> Incidently, there are compilers these days that can compile Java to
> native. Is this Java stuff setup in such a way that you can compile your
> classes to native and load directly for the real speed-freaks?

PL/Java can be used with GCJ although I don't think the GCJ compiler outranks the JIT
compiler in a modern JVM. It can only do static optimizations whereas the JIT has runtime
heuristics to base its optimizations on. In the test results I've seen so far, the GCJ
compiler only gets the upper hand in very simple tests. The JIT generated code is faster
when things are more complicated.

GCJ is great if you're using short-lived connections (less startup time and everything is
optimized from the very start) but the native code that it produces still needs a JVM of
some sort. No interpreter of course but classes must be initialized, a garbage collector
must be running etc. The shared native code results in some gain in memory consumption but
it's not as significant as one might think.

> In that
> case, maybe you should concentrate on relibility and flexibility and
> still have a way out for functions that *must* be high-performance.
>

Given time and enough resources, I'd like to provide the best of two worlds and give the
user a choice whether or not the JVM should be external. Ideally, this should be controlled
using configuration parameters so that its easy to test which scenario that works best. It's
a lot of work though.

It very much comes down to your point "Are you optimising for many short-lived connections
or a few long-lived connections?"

If the use-cases for the former are fairly few then I'm not sure it's worth the effort. In
my experience, that is the case. People tend to use connection pools nowadays. But that's me
and my opinion. It would be great if more people where involved in this discussion.

Regards,
Thomas Hallgren

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qingqing Zhou 2006-03-27 12:52:11 Re: PANIC: heap_update_redo: no block
Previous Message Martijn van Oosterhout 2006-03-27 10:45:06 Re: Shared memory

Browse pljava-dev by date

  From Date Subject
Next Message Thomas Hallgren 2006-03-27 14:15:38 [Pljava-dev] [HACKERS] Shared memory
Previous Message Martijn van Oosterhout 2006-03-27 10:45:06 Re: Shared memory