Quick Links

Re: Shared memory

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Thomas Hallgren <thomas(at)tada(dot)se>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PL/Java Development <Pljava-dev(at)gborg(dot)postgresql(dot)org>
Subject:	Re: Shared memory
Date:	2006-03-28 17:56:22
Message-ID:	1143568582.32384.27.camel@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pljava-dev

On Tue, 2006-03-28 at 17:48 +0200, Thomas Hallgren wrote:

> Simon Riggs wrote:
> > Just some thoughts from afar: DB2 supports in-process and out-of-process
> > external function calls (UDFs) that it refers to as UNFENCED and FENCED
> > procedures. For Java only, IBM have moved to supporting *only* FENCED
> > procedures for Java functions, i.e. having a single JVM for all
> > connections.
> >
> Are you sure about this?

Yes.

> As I recall it a FENCED stored procedure executed in a remote JVM
> of it's own. A parameter could be used that either caused a new JVM to be instantiated for
> each stored procedure call or to be kept for the duration of the session. The former would
> yield really horrible performance but keep memory utilization at a minimum. The latter would
> get a more acceptable performance but waste more memory (in par with PL/Java today).

In the previous release, yes.

> > That approach definitely does increase the invocation time, but it
> > significantly reduces the resources associated with the JVM, as well as
> > allowing memory management to be more controllable (bliss...). So the
> > overall picture could be more CPU and memory resources for each
> > connection in the connection pool.
> >
> My very crude measurements indicate that the overhead of using a separate JVM is between
> 6-15MB of real memory per connection. Today, you get about 10MB/$ and servers configured
> with 4GB RAM or more are not uncommon.
>
> I'm not saying that the overhead doesn't matter. Of course it does. But the time when you
> needed to be extremely conservative with memory usage has passed. It might be far less
> expensive to buy some extra memory then to invest in SMP architectures to minimize IPC overhead.
>
> My point is, even fairly large app-servers (using connection pools with up to 200
> simultaneous connections) can run using relatively inexpensive boxes such as an AMD64 based
> server with 4GB RAM and show very good throughput with the current implementation.

Memory is cheap, memory bandwidth is not.

All CPUs have limited cache resources, so the more mem you waste, the
less efficient your CPUs will be.

That effects the way you do things, sure. 1GB lookup table: no problem.
10MB wasted memory retrieval: lots of dead CPU time.

> > If you have a few small Java functions centralisation would not be good,
> > but if you have a whole application architecture with many connections
> > executing reasonable chunks of code then this can be a win.
> >
> One thing to remembered is that a 'chunk of code' that executes in a remote JVM and uses
> JDBC will be hit by the IPC overhead on each interaction over the JDBC connection. I.e. the
> overhead is not just limited to the actual call of the UDF, it's also imposed on all
> database accesses that the UDF makes in turn.
>
>
> > In that environment we used Java for major database functions, with SQL
> > functions for small extensions.
> >
> My guess is that those major database functions did a fair amount of JDBC. Am I right?

Not once I'd reviewed them...

> > Also the Java invocation time we should be celebrating is that by having
> > Java in the database the Java<->DB time is much less than it would be if
> > we had a Java stack sitting on another server.
> >
>
> I think the cases when you have a Tomcat or JBoss sitting on the same physical server as the
> actual database are very common. One major reason being that you don't want network overhead
> between the middle tier and the backend. Moving logic into the database instead of keeping
> it in the middle tier is often done to get rid of the last hurdle, the overhead of IPC.

I can see the performance argument for both, but supporting both,
especially in a mix-and-match architecture is much harder.

Anyway, just trying to add some additional perspective.

Best Regards, Simon Riggs

In response to

Re: Shared memory at 2006-03-28 15:48:00 from Thomas Hallgren

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2006-03-28 18:11:53	Re: [GENERAL] PANIC: heap_update_redo: no block
Previous Message	Thomas Hallgren	2006-03-28 17:11:00	Re: Shared memory

Browse pljava-dev by date

	From	Date	Subject
Next Message	Dave Cramer	2006-03-28 18:25:36	Re: Shared memory
Previous Message	Thomas Hallgren	2006-03-28 17:11:00	Re: Shared memory