Re: COPY support in pgsql-jdbc driver

From: Barry Lind <barry(at)xythos(dot)com>
To: Michael Adler <adler(at)glimpser(dot)org>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, Sam Varshavchik <mrsam(at)courier-mta(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: COPY support in pgsql-jdbc driver
Date: 2002-06-20 05:05:14
Message-ID: 3D11628A.70002@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Michael,

Overall I think this looks like a good start. While I could think of
some ways to hack this support into the regular codepath, I think having
the functionality exposed as separate methods like this is easier and
cleaner. Some suggestions I do have would be (I realize my list may
seem long, but I am just trying to be complete in my review and feedback):

1) The copyIn method should take an InputStream instead of a
ByteArrayInputStream since forcing the entire dataset to be in memory
would be a problem for a large table.

2) Same comment as 1 for copyOut(). But in this case instead of
returning a ByteArrayOutputStream, I believe the correct way to handle
this is to take an OutputStream as a parameter to write the output to.

3) The patch should contain updates to the documetation that explain the
new methods and how they are used.

4) Both methods should do a synchronized(pg_stream) { ... }

5) Error messages should come from the properties bundle and not be hard
coded english text.

6) Some debugging output would probably be usefull as well, (see current
sources and the isDebug method and how it is used).

7) The testcase should be done using junit and added to the tests under
org/postgresql/test

8) Need to decide how to handle character set conversions, since you are
not currently doing any character set conversions for either the input
or output. Since the client character set may be different than the
server character set, this needs to be considered. You probably need an
additional argument to each method for the character set to use
(probably also have methods without the extra parameter that assume the
default jvm character set should be used). You can probably optimize
this if you know that the source and target character set are the same
to be a noop.

9) I think the logic that looks for the end of data marker can be more
efficient. Off the top of my head (without giving too much thought to
it) something along the lines of:
read from stream into a buffer
loop through the buffer spitting out its contents while byte != '\\'.
When you find a '\\' in the stream then look forward two characters
and handle accordingly.

Reading one byte at a time from the stream will be slow, that is why it
would be better to read into a buffer.

thanks,
--Barry

PS. I hope you have more time in the future to contribute additional
features to the jdbc driver codebase.

Michael Adler wrote:

>I'm trying to add COPY support to the org.postgresql driver. This is my
>first work with pgsql below the "application" level.
>
>The first challenge is that the response to a COPY query is not a result
>set, so the assumptions of the QueryExecutor don't apply nicely here
>(execute() returns a ResultSet). I didn't investigate the possibility of
>making the copy output behave like a resultset, although I guess it's an
>option.
>
>The solution I came up with is to add two methods to the
>org.postgresql.Connection class and have them manage the entire transfer.
>This is not beautiful, but it's a start. Someone who is "intimate" with
>the project will no doubt have a better idea. (although the copy protocol
>seems to be the exception to the rest of the protocol, so perhaps the
>code would reflect that).
>
>The copyOut() method works well enough to read actual data from the
>backend.
>
>The copyIn() methods does not work. I'm not sure why.
>
>I suspect one problem is my mishandling of the pg_stream and the fe/be
>protocol.
>
>Two other shortcomings of the code - the lack of exception handling and
>the lack of synchronization of the pg_stream. I'm not sure if the later is
>a problem, but those can be addressed later.
>
>
>// in org.postgresql.Connection :
>
>
>
> // *****************
> // Postgres COPY handling
> // *****************
>
> /*
> * This will take the name of a table, construct a COPY OUT query, send the query
> * ( while bypassing QueryExecutor), receive the resulting bytes of data and return
> * a ByteArrayOutputStream.
> *
> */
>
> public ByteArrayOutputStream copyOut(String table) throws Exception
> {
> ByteArrayOutputStream out = new ByteArrayOutputStream();
>
> // duplicates statements in QueryExecutor.sendQuery
> pg_stream.SendChar('Q');
> pg_stream.Send(this.getEncoding().encode( "COPY " + table + " TO STDOUT" ));
> pg_stream.SendChar(0);
> pg_stream.flush();
>
> // check response from backend
> int response = pg_stream.ReceiveChar();
>
> if (response != 'H') {
> throw new Exception("Copy should receive H from backend, but instead received: " + (char)response );
> }
>
> // read input stream one char at a time, but always holding three
> int a = pg_stream.ReceiveChar();
> int b = pg_stream.ReceiveChar();
> int c = pg_stream.ReceiveChar();
>
> while (true) {
> if ( a == '\\' && b == '.' && c == '\n' ) {
> // this sequence of bytes means the copy is over
> break;
> }
>
> out.write(a);
>
> a = b;
> b = c;
> c = pg_stream.ReceiveChar();
> }
>
> String str = pg_stream.ReceiveString(this.getEncoding());
> System.out.println( "Received String " + str );
>
> return out;
> }
>
>
> /*
> * This will take the name of a table and a ByteArrayInputStream, construct a COPY IN query,
> * send the query ( while bypassing QueryExecutor), send the bytes of data and send the 3 bytes
> * that signify the end of the copy
> *
> */
>
> public void copyIn (String table, ByteArrayInputStream in) throws Exception
> {
> // duplicates statements in QueryExecutor.sendQuery
> pg_stream.SendChar('Q');
> pg_stream.Send(this.getEncoding().encode( "COPY " + table + " FROM STDIN " ));
> pg_stream.SendChar(0);
> pg_stream.flush();
>
> // check response from backend
> int response = pg_stream.ReceiveChar();
>
> if (response != 'G') {
> throw new Exception("Copy should receive G from backend, but instead received: " + (char)response );
> }
>
> // send the whole input stream
> int b = in.read();
> while (b != -1) {
> pg_stream.SendChar((char)b);
> b = in.read();
> }
>
> //send the special row
> pg_stream.Send( new byte[] { (byte)'\\', (byte)'.', (byte)'\n' } );
> pg_stream.flush();
>
> String str = pg_stream.ReceiveString(this.getEncoding());
> // str should be "COPY" ?
> System.out.println( "Received String " + str );
> }
>
>
>
>############################################
>here's the class that's used to test the methods
>############################################
>
>import java.sql.*;
>import java.util.*;
>import javax.sql.*;
>import java.net.*;
>import java.io.*;
>
>public class TestCopy {
>
> static {
> try {
> Class.forName("org.postgresql.Driver");
> }
> catch (Exception e) {
> e.printStackTrace();
> System.err.println(e);
> System.exit(1);
> }
> }
>
> public TestCopy () throws Exception {
> // nothing in constructor
> }
>
> public static void main (String array[] ) throws Exception {
>
> Connection local_con = DriverManager.getConnection("jdbc:postgresql://vision/sync_corp2", "eagle" , "c0ntr0l");
>
> // cast the connection so that you can access methods not available in java.sql.Connection
> org.postgresql.Connection con = (org.postgresql.Connection)local_con;
>
> // create a byte stream by copying out data from the source table
> ByteArrayOutputStream out = con.copyOut("source_table");
>
> // copy the byte stream into another table. in practice, you'd use COPY to
> // copy data from one database to another, not just one table to another
> con.copyIn("destination_table", new ByteArrayInputStream(out.toByteArray()));
>
> }
>}
>
>
>
>
>
>
>
>
>
>On 14 Jun 2002, Dave Cramer wrote:
>
>
>
>>Michael,
>>
>>You are likely going to have to look at the code in psql, and create a
>>stream to copy from.
>>
>>Dave
>>On Fri, 2002-06-14 at 14:44, Michael Adler wrote:
>>
>>
>>>On Fri, 14 Jun 2002, Sam Varshavchik wrote:
>>>
>>>
>>>
>>>>Date: Fri, 14 Jun 2002 14:02:40 -0400
>>>>From: Sam Varshavchik <mrsam(at)courier-mta(dot)com>
>>>>To: Michael Adler <adler(at)glimpser(dot)org>
>>>>Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
>>>>Subject: Re: COPY support in pgsql-jdbc driver
>>>>
>>>>Michael Adler writes:
>>>>
>>>>
>>>>
>>>>>>>in the driver. I don't believe it is a jdbc standard though?
>>>>>>>
>>>>>>>
>>>>>>It's not. I'm quite happy with a separate API.
>>>>>>
>>>>>>
>>>>>Which API are you refering to? jxdbcon?
>>>>>
>>>>>
>>>>The org.postgresql package.
>>>>
>>>>
>>>Sam,
>>>
>>>Is there any documentaion on how to use COPY with the org.postgresql
>>>package? I haven't found any.
>>>
>>>org.postgresql.core.QueryExecutor doesn't seem to support it. How do you
>>>get it to work?
>>>
>>>Thanks,
>>>
>>>Mike
>>>
>>>
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>>>
>>>
>>>
>>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: if posting/reading through Usenet, please send an appropriate
>>subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>>message can get through to the mailing list cleanly
>>
>>
>>
>
>Mike
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Sam Varshavchik 2002-06-20 05:55:40 Re: COPY support in pgsql-jdbc driver
Previous Message Parul Agarwal 2002-06-20 04:48:07 Posgresql - openoffice-JDBC