From: | Barry Lind <barry(at)xythos(dot)com> |
---|---|
To: | Sam Varshavchik <mrsam(at)courier-mta(dot)com> |
Cc: | "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org> |
Subject: | Re: COPY support in pgsql-jdbc driver |
Date: | 2002-06-20 15:50:31 |
Message-ID: | 3D11F9C7.2080707@xythos.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-jdbc |
Sam Varshavchik wrote:
> Barry Lind writes:
>
>> 8) Need to decide how to handle character set conversions, since you
>> are not currently doing any character set conversions for either the
>> input or output. Since the client character set may be different
>> than the server character set, this needs to be considered. You
>> probably need an additional argument to each method for the character
>> set to use (probably also have methods without the extra parameter
>> that assume the default jvm character set should be used). You can
>> probably optimize this if you know that the source and target
>> character set are the same to be a noop.
>
>
> What's being dumped and reloaded here is a byte-stream
> (InputStream/OutputStream), not a character-stream (Reader/Writer).
> Presumably, the only thing that's ever going to be reloaded something
> that was dumped previously, so no conversions are necessary.
This is not correct. The data coming from the server is a stream of
characters in the character encoding of the server. This character
encoding may be different than the client character encoding, and
therefore character set conversions are necessary. Lets say for example
the database is running with UTF-8 as it's character set, thus the
output of the copy will be UTF-8 encoded. If the client is running
Latin1 then there will be a missmatch and all 8bit characters will be
interpreted incorrectly by the client. Character set conversion is
necessary in this case.
>
>> 9) I think the logic that looks for the end of data marker can be
>> more efficient. Off the top of my head (without giving too much
>> thought to it) something along the lines of:
>> read from stream into a buffer
>> loop through the buffer spitting out its contents while byte != '\\'.
>> When you find a '\\' in the stream then look forward two characters
>> and handle accordingly.
>> Reading one byte at a time from the stream will be slow, that is why
>> it would be better to read into a buffer.
>
>
> Just read from an InputStream, and let the caller worry about stacking
> a BufferedInputStream on top of it.
It will still be more effiecient to do the buffering in the code than to
rely on a BufferedInputStream. Performing a method call to get each new
byte is much more overhead than iterating through a byte[].
thanks,
--Barry
From | Date | Subject | |
---|---|---|---|
Next Message | Sam Varshavchik | 2002-06-20 16:35:57 | Re: COPY support in pgsql-jdbc driver |
Previous Message | Peter Bäck | 2002-06-20 10:02:41 | Bug with binding query parameters hangs entire connection instance. (Possibly redundant notice) |