[Pljava-dev] advice needed

From: thhal at mailblocks(dot)com (Thomas Hallgren)
To:
Subject: [Pljava-dev] advice needed
Date: 2005-02-17 07:50:02
Message-ID: thhal-0MOXvAtWSxiccCcQ/4BdDBv2p35cK05@mailblocks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pljava-dev

George,
As I wrote in my previous mail, I feel that I'm not communicating why
the current solution makes good sense. One reason may be that we talk
passed eachother. I read you mail again, looking for replies that might
indicate that things need to be further clarified. I found a couple:

>If I have defined a type and then a SET OF return function I would like
>to return a homogenous data, otherwise I may as well define an
>ArrayList and stuff it with all sort of objects and use the iterator to
>iterate over (therefore no SET OF needed at all).
>
>
When I talk about different sources, I don't mean that each row is of a
different type. I mean that each row is of the same type and that each
row can contain columns with values that originates from different
sources (like a join, but not limited to database tables).

>I can call the standard rs.moveToInsertRow method as many times as I like until my
>ResultSet is ready to be returned.
>
>
Two things:
1. Where did you get the empty ResultSet that you start with?
2. This means you build everythin in memory, no?

>Actually the receiver is just that - a result set! Being one row or
>multiple rows - it doesn't matter much since it *has* to be of the same
>type, so why bother building it step-by-step when it can be build in a
>single call?
>
>
Here's some major misconception. You seem to think that a ResultSet is
built up always. It isn't. One row at a time is streamed back to the
PostgresSQL query evaluator. No set is ever built up. One way of
accessing the set from a client is to use JDBC and there you would have
a ResultSet. Using some other client, you'd have some other mechanism.
Common for all of them is that no ResultSet containing all rows is ever
instantiated. Data is always streamed.

>>You must look at this ResultSet object, not as a "set" per se, but as a
>>single Tuple. There's no way to position within this set or add a row.
>>
>>
>
>Yes, there is - when you build a result set (that is ResultSet as
>specified by java.sql) you can always use rs.moveToInsertRow and
>rs.updateXXX as many times as you like.
>
This seem to be another major misconception. The ResultSet that you find
in the java.sql package is an *interface*. The ResultSet that is passed
to the assignRowValues is an implementation of that interface. You have
to belive me that this implementation represents one single tuple and
that there's no way you can do positioning or adding. I should know. I
wrote it.

As I said, if java.sql had an interface that only contained all the
getXXX and updateXXX mehods and nothing else, that interface would have
been used instead. Unfortunately, no such interface exists.

>>So current approach:
>>~~~~~~~~~~~~
>>on first call:
>> 1. create a single-row ResultSet object.
>> 2. call assignRowValues.
>> 3. process result (extract the tuple data from the ResultSet object).
>>
>>on each subsequent call:
>> 1. call assignRowValues using the same single-row ResultSet object.
>> 2. process result (extract the tuple data from the ResultSet object).
>>
>>Now, with your suggested approach you have two choices:
>>1. Build a SyntheticResultSet in memory and return it.
>>
>>
>
>Nope! Standard java.sql ResultSet will do fine thank you (which is
>*not* entirely in memory but value is retracted on call to next and
>getXXX - *big* plus).
>
>
You contradict yourself. First you say that the way to do this is by
doing lots of moveToInsertRow and add values, then you say it's not
entirely in memory. Where do you think your data goes?

>>A fair amount of code and the result might consume an unacceptable amount of memory.
>>
>>
>
>Not at all!
>
>When you build the reult set all you have to do is st.executeQuery
>(which you will do anyway if you have to gather data from the database)
>
>
As stated earlier. If all you want to do is return a ResultSEt that
originates from one single executeQuery, then the idea of a getResult()
API is excellent. The whole point whith this exercise was to show how a
SETOF can be returned when you have a result that does *not* originate
from a single database query.

>>2. Create your own implementation of ResultSet where you are the
>>implementor of the next() method. This is a great deal of work.
>>
>>
>
>Why would I want to do that
>
For one simple reason. You don't want to build a ResultSet completely in
memory before you return it.

>within the C function you call the java class to return the ResultSet
>(i.e. getResult) and then if a single-step iteraction is required
>pljava (i.e. the C function or whatever the internal implementation is)
>iterates over the set - that is it.
>
No, that's not it. The C function is a streaming function and it is
called by the PostgreSQL query evaluator. It doesn't require access to
the entire set. It only wants one row. And that's the essense of
everything. No huge set of data is built up in memory, ever!

Further more, as soon as a requested row satisfies a query, no further
rows will be requested. Very significant if you for instance use your
function as the source of an ANY or IN predicate in a SELECT.

> 1. assignRowValues method be scrapped completely;

That won't happen since it's the only way to stream data from a source
to the caller of a function.

>2. replaced with a single interface method (I used getResult, but I
>think I should have called it getJDBCResultSet instead for clarity),
>which returns ResultSet (as in java.sql) and that result set is
>processed either as a whole or in a loop internally depending on
>whether PostgreSQL internals need it to be preprocessed step by step -
>all that *without any further involvement of the client interface*.
>
>
As I said, this is a good idea in cases where you already have a
ResultSet handy. But it does not give you the ability to stream data
when the source is something other than a query.

>I fail to see where is the 'fair amount of code' or the 'large memory
>consumption' as you are suggesting?
>
>
Perhaps you see it now?

>Which is what happens in 99% of all cases - you have to use java.sql
>ResultSet in every database itteration - there is no other choice -
>data retrieved is through this result set.
>
>
What if your function wants to filter some of the rows based on a rule
that cannot be expressed in a where clause?

>The primary motive to put java code in a database is that it stays as
>close to the data as possible so that queries are much quicker to
>execute and so are the updates, otherwise why would you need to put
>your java code in the database and run classes to make socket
>connections to different places as you suggested - if that was the case
>then I may as well have a separate application server to reside the jar
>file on and not bother with the damn thing!
>
>
Yeah right. And then join the result with something from the database
and add a where clause to the expression... Read - there are other
reasons to produce a SETOF that the database query evaluator can make
use of.

>>I assume that 'activate' is the same as 'initialise' and 'passivate' is
>>the same as 'lastRowProcessed'?
>>
>>
>
>Not quite! Activate occurs after initialise, make occurs before
>anything else (make instantiates it - i.e. creates new class, which is
>not necessarily the case with a pool since objects are reused).
>
>
So make *is* the constructor. It's only called when the object is
created, not when it is reused.

>lastRowProcessed occurs as soon as the last row has been processed, but
>before the object has been put back into the pool. One example of use
>of these two methods is: use lasRowProcessed to reinitialise text
>patterns and write transaction logs for successful operation, use
>passivate to validate the object before it is placed back in the pool
>(this has a special meaning for the pool factory). Just to clarify the
>events as they occur:
>
>new Class();
>fire make // after that object is placed in the pool
>// when needed object is taken from the pool
>fire initialise // validation, re-initialisation of internal variables
>and states may occur here
>fire activate
>call getJDBCResultSet
>// process the results
>fire lastRowProcessed when the last row of the result set has been
>processed
>fire passivate
>// when object needs to be destroyed
>fire destroy
>class = null // to be processed by the gc later
>
>
Right. Nothing happens between initialise and activate and nothing
happens between lastRowProcessed and passivate. So why have four calls
instead of two? If you want to call a lastRowProcessed first thing from
within your passivate, then do so. If not, you've just saved a call. Why
should pljava always do this call? Seems to me you are forced to
implement two methods that you are unlikely to need and pljava is forced
to make four calls instead of two.

- thomas

In response to

Browse pljava-dev by date

  From Date Subject
Next Message Robby Russell 2005-02-19 01:47:21 [Pljava-dev] minor issue on documentation page
Previous Message Thomas Hallgren 2005-02-17 01:51:07 [Pljava-dev] advice needed