From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Mark Dilger <pgsql(at)markdilger(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: utf8 COPY DELIMITER? |
Date: | 2007-04-17 18:28:18 |
Message-ID: | 4129.1176834498@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Mark Dilger wrote:
>> I'm working on fixing bugs relating to multibyte character encodings.
>> I wasn't sure whether this was a bug or not. I don't think we should
>> use the phrasing "COPY delimiter must be a single character" when, in
>> utf8 land, I did in fact use a single character. We might say "a
>> single byte", or we might extend the functionality to handle multibyte
>> characters.
> Doing the latter would be a feature, and so is of course right off the
> table for this release. Changing the error messages to be clearer should
> be fine.
+1 on changing the message: "character" is clearly less correct than "byte"
here.
I doubt that supporting a single multibyte character would be an
interesting extension --- if we wanted to do anything at all there, we'd
just generalize the delimiter to be an arbitrary string. But it would
certainly slow down COPY by some amount, which is an area where you'll
get push-back for performance losses, so you'd need to make a convincing
use-case for it.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-04-17 18:33:40 | Re: utf8 COPY DELIMITER? |
Previous Message | Andrew Dunstan | 2007-04-17 17:37:58 | Re: utf8 COPY DELIMITER? |