Quick Links

Re: COPY support in pgsql-jdbc driver

From:	Barry Lind <barry(at)xythos(dot)com>
To:	Sam Varshavchik <mrsam(at)courier-mta(dot)com>
Cc:	"pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject:	Re: COPY support in pgsql-jdbc driver
Date:	2002-06-20 15:50:31
Message-ID:	3D11F9C7.2080707@xythos.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-jdbc

Sam Varshavchik wrote:

> Barry Lind writes:
>
>> 8) Need to decide how to handle character set conversions, since you
>> are not currently doing any character set conversions for either the
>> input or output. Since the client character set may be different
>> than the server character set, this needs to be considered. You
>> probably need an additional argument to each method for the character
>> set to use (probably also have methods without the extra parameter
>> that assume the default jvm character set should be used). You can
>> probably optimize this if you know that the source and target
>> character set are the same to be a noop.
>
>
> What's being dumped and reloaded here is a byte-stream
> (InputStream/OutputStream), not a character-stream (Reader/Writer).
> Presumably, the only thing that's ever going to be reloaded something
> that was dumped previously, so no conversions are necessary.

This is not correct. The data coming from the server is a stream of
characters in the character encoding of the server. This character
encoding may be different than the client character encoding, and
therefore character set conversions are necessary. Lets say for example
the database is running with UTF-8 as it's character set, thus the
output of the copy will be UTF-8 encoded. If the client is running
Latin1 then there will be a missmatch and all 8bit characters will be
interpreted incorrectly by the client. Character set conversion is
necessary in this case.

>
>> 9) I think the logic that looks for the end of data marker can be
>> more efficient. Off the top of my head (without giving too much
>> thought to it) something along the lines of:
>> read from stream into a buffer
>> loop through the buffer spitting out its contents while byte != '\\'.
>> When you find a '\\' in the stream then look forward two characters
>> and handle accordingly.
>> Reading one byte at a time from the stream will be slow, that is why
>> it would be better to read into a buffer.
>
>
> Just read from an InputStream, and let the caller worry about stacking
> a BufferedInputStream on top of it.

It will still be more effiecient to do the buffering in the code than to
rely on a BufferedInputStream. Performing a method call to get each new
byte is much more overhead than iterating through a byte[].

thanks,
--Barry

In response to

Re: COPY support in pgsql-jdbc driver at 2002-06-20 05:55:40 from Sam Varshavchik

Responses

Re: COPY support in pgsql-jdbc driver at 2002-06-20 16:35:57 from Sam Varshavchik

Browse pgsql-jdbc by date

	From	Date	Subject
Next Message	Sam Varshavchik	2002-06-20 16:35:57	Re: COPY support in pgsql-jdbc driver
Previous Message	Peter Bäck	2002-06-20 10:02:41	Bug with binding query parameters hangs entire connection instance. (Possibly redundant notice)