[Pljava-dev] advice needed

From: info at wyse-systems(dot)ltd(dot)uk (George)
To:
Subject: [Pljava-dev] advice needed
Date: 2005-02-17 00:31:20
Message-ID: 200502170032.j1H0WKa20621@eagle.cqhost.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pljava-dev

On Wed, 16 Feb 2005 23:29:04 +0100, Thomas Hallgren wrote:

>>// client interface method
>>public ResultSet getResult();
>>The receiver presented in the assignRowValues is already a ResultSet consisting of a single row. If you are worried about type matching you can easily check for that by gathering the appropriate meta data (as you do with a normal ResultSet) or throw an exception if type constraints have been violated (see below).
>>Why is it a problem to supplement it with a ResultSet object containing not one row, but the entire set and then iterate over it *internally*, using next?
>Let's assume that you want to return a set that cannot be expressed as a
>single query.

What's the likelyhood of that happening?

If I have defined a type and then a SET OF return function I would like
to return a homogenous data, otherwise I may as well define an
ArrayList and stuff it with all sort of objects and use the iterator to
iterate over (therefore no SET OF needed at all).

Even if I assume for the moment that you are correct and more likely
than not I would need to gather information from different sources and
prepare the ResultSet (*highly* unlikely, but for the sake of the
argument let's assume that this is the case) then the method described
above can return the entire ResultSet by building each row within the
method itself (no multiple calls necessary), i.e. I can call the
standard rs.moveToInsertRow method as many times as I like until my
ResultSet is ready to be returned.

I don't see why would I need a method which is called separately and
gives me the row number (which I would never use anyway since it is
going to match the 'next' method call or even if I did need it it is
easy enough to set up a class variable called count which increases
each time a call is made - all this is academic, of course) when with a
single call I can build the entire result set whether gathered from
multiple sources or a single one. Again, it is highly likely that since
I'll be using data iteraction I would need to use java.sql.ResultSet.

>Each row in your result contains data that you create from
>one or more sources. The source might be a socket, a file, a query
>combined with other sources, etc.

Again, what is the likelihood of that happening - why would I need to
create a SETOF function involved in a query to gather data from a file
or a socket for example? Even if I do, as I pointed out above this can
easily be build within the getResult() method call and the resulting
set (i..e java.sql.ResultSet) created there and then - no need for
multiple calls and no need for passing arguments unnecessarily - so
where is the use of 'fair amount of code' or 'unnaceptable amount of
memory' - no prizes for guessing which one's would perform better -
calling a method once and returning the result in a single statement or
calling another method 1000s of times and passing two parameters each
time the method is called?

It is your call.

>Point is, you don't have a ResultSet
>to return. Using the current approach you have no problem doing this.
>You simply update the row that is passed to with data from your sources,
>once for each row, and you're all set. The tailor made, single row
>ResultSet object that is passed to this method of course reused for each
>call.

Actually the receiver is just that - a result set! Being one row or
multiple rows - it doesn't matter much since it *has* to be of the same
type, so why bother building it step-by-step when it can be build in a
single call?

>
>You must look at this ResultSet object, not as a "set" per se, but as a
>single Tuple. There's no way to position within this set or add a row.

Yes, there is - when you build a result set (that is ResultSet as
specified by java.sql) you can always use rs.moveToInsertRow and
rs.updateXXX as many times as you like. Again, this won't be needed if
SQL query is used to get the results. That is only going to be needed
if I decide for whatever reason to integrate my Java code within the
database and *not* use any database iteraction whatsoever, but some
fancy information sources instead as you are suggesting above.

>So current approach:
>~~~~~~~~~~~~
>on first call:
> 1. create a single-row ResultSet object.
> 2. call assignRowValues.
> 3. process result (extract the tuple data from the ResultSet object).
>
>on each subsequent call:
> 1. call assignRowValues using the same single-row ResultSet object.
> 2. process result (extract the tuple data from the ResultSet object).
>
>Now, with your suggested approach you have two choices:
>1. Build a SyntheticResultSet in memory and return it.

Nope! Standard java.sql ResultSet will do fine thank you (which is
*not* entirely in memory but value is retracted on call to next and
getXXX - *big* plus).

> A fair amount of code and the result might consume an unacceptable amount of memory.

Not at all!

When you build the reult set all you have to do is st.executeQuery
(which you will do anyway if you have to gather data from the database)
and ... well ...that's it - return the ResultSet to the caller and
don't worry about it.

>2. Create your own implementation of ResultSet where you are the
>implementor of the next() method. This is a great deal of work.

Why would I want to do that - with what I was suggesting in my previous
posting is that pljava handles the iteraction from the resulting
ResultSet object depending on whether or not PostgreSQL needs a
single-row iteractions, in other words:

within the C function you call the java class to return the ResultSet
(i.e. getResult) and then if a single-step iteraction is required
pljava (i.e. the C function or whatever the internal implementation is)
iterates over the set - that is it. From developer's point of view the
only 'great deal of work' is supplying the entire ResultSet, which in
case of having this as a result of a SQL query is a piece of cake,
otherwise the steps performed to build it will be *exactly the same*
(if not better) as if a single-row method is called, but more efficient
since the parameter passing and single-row ResultSet creation would not
exist.

>A lot more then just implementing the assignRowValues method.

Am I talking a different language here? See above!

>
>New approach:
>~~~~~~~~~~~~
>on first call:
> 1. call getResult()
> 2. call next() on the obtained ResultSet
> 3. process result (extract the tuple from the ResultSet object).
>
>on each subsequent call:
> 1. call next()
> 2. process result (extract the tuple from the ResultSet object).
>
>Same amount of work for both approaches. The first approach doesn't
>suffer from any of the disadvantages that the second approach has so
>there's a good motivation to keep it.

This is not at all what I was suggesting.

All I was suggesting is that:

1. assignRowValues method be scrapped completely;
2. replaced with a single interface method (I used getResult, but I
think I should have called it getJDBCResultSet instead for clarity),
which returns ResultSet (as in java.sql) and that result set is
processed either as a whole or in a loop internally depending on
whether PostgreSQL internals need it to be preprocessed step by step -
all that *without any further involvement of the client interface*.

In other words the client only implements getJDBCResultSet method -
*THAT IS IT*!

I fail to see where is the 'fair amount of code' or the 'large memory
consumption' as you are suggesting?

>
>
>>That I think would be much easier for developers like myself to handle it - all I have to worry about then is to prepare the ResultSet in a way I want it without the need to get bogged down in implementing iteractions and mess about with row numbers and the like.
>>
>>
>Yes. The use case you have, when a function actually executes a query
>and want to return the result of that query is a good motivation to add
>the new approach.

Which is what happens in 99% of all cases - you have to use java.sql
ResultSet in every database itteration - there is no other choice -
data retrieved is through this result set.

The primary motive to put java code in a database is that it stays as
close to the data as possible so that queries are much quicker to
execute and so are the updates, otherwise why would you need to put
your java code in the database and run classes to make socket
connections to different places as you suggested - if that was the case
then I may as well have a separate application server to reside the jar
file on and not bother with the damn thing!

>
>>Better still, you can define and 'fire' different events through the entire process to give developers more control of what is being done. If adopted I have a suggestion of (at least) three such events (I suspect these methods will be in addition to the once controlling the pool behaviour, like 'make', 'activate', 'passivate' and 'destroy'):
>>
>>public void initialise(); // fired before the getResult() interface method is actually called to give the client class a chance to initialise itself
>>public void lastRowProcessed(); // fired after the last row of the ResultSet has been processed;
>>public void processException(Exception e); // when pljava/processing exception has occured
>>
>>
>I assume that 'activate' is the same as 'initialise' and 'passivate' is
>the same as 'lastRowProcessed'?

Not quite! Activate occurs after initialise, make occurs before
anything else (make instantiates it - i.e. creates new class, which is
not necessarily the case with a pool since objects are reused).
lastRowProcessed occurs as soon as the last row has been processed, but
before the object has been put back into the pool. One example of use
of these two methods is: use lasRowProcessed to reinitialise text
patterns and write transaction logs for successful operation, use
passivate to validate the object before it is placed back in the pool
(this has a special meaning for the pool factory). Just to clarify the
events as they occur:

new Class();
fire make // after that object is placed in the pool
// when needed object is taken from the pool
fire initialise // validation, re-initialisation of internal variables
and states may occur here
fire activate
call getJDBCResultSet
// process the results
fire lastRowProcessed when the last row of the result set has been
processed
fire passivate
// when object needs to be destroyed
fire destroy
class = null // to be processed by the gc later

process Exception will be invaluable if exception occurs in the
processing of the ResultSet (i.e. internally, within pljava) so that it
can be propagated on (to be logged or appropriate action taken) by the
Java code - i.e. the client interface would know about it - it will
only serve as a feedback method, nothing else.

Regards,

George

Responses

Browse pljava-dev by date

  From Date Subject
Next Message Thomas Hallgren 2005-02-17 01:51:07 [Pljava-dev] advice needed
Previous Message Thomas Hallgren 2005-02-16 22:29:04 [Pljava-dev] advice needed