| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> | 
|---|---|
| To: | itagaki(dot)takahiro(at)gmail(dot)com | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Support UTF-8 files with BOM in COPY FROM | 
| Date: | 2011-09-26 14:33:50 | 
| Message-ID: | 20110926.233350.224883171232526681.t-ishii@sraoss.co.jp | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
> I'd like to support UTF-8 text or csv files that has BOM (byte order mark)
> in COPY FROM command. BOM will be automatically detected and ignored
> if the file encoding is UTF-8. WIP patch attached.
>From RFC3629(http://tools.ietf.org/html/rfc3629#section-6):
 o A protocol SHOULD forbid use of U+FEFF as a signature for those
   textual protocol elements that the protocol mandates to be always
   UTF-8, the signature function being totally useless in those cases.
COPY explicitly specifies the encoding (to be UTF-8 in this case).  So
I think we should not regard U+FEFF as "BOM" in COPY, rather we should
regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE".
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2011-09-26 14:44:38 | Re: Support UTF-8 files with BOM in COPY FROM | 
| Previous Message | Peter Eisentraut | 2011-09-26 14:30:32 | Re: [v9.2] make_greater_string() does not return a string in some cases |