Quick Links

Re: Support UTF-8 files with BOM in COPY FROM

From:	Brar Piening <brar(at)gmx(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support UTF-8 files with BOM in COPY FROM
Date:	2011-09-26 18:57:25
Message-ID:	4E80CB15.10706@gmx.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
> that Microsloth does it does not make it standards-conformant.

Could you share a pointer to the spec?
All I've ever heard is that a BOM is optional for UTF-8 but not forbidden.

The Unicode FAQ (http://unicode.org/faq/utf_bom.html#BOM) states "that
some recipients of UTF-8 encoded data do not expect a BOM".
Postgres obviously belongs to those recipients.
That's why all my psql-scripts transferring data from MSSQL to Postgres
need a '\! perl -CD -pi.orig -e "tr/\x{feff}//d" "C:/datafile.txt"'
before feeding data into COPY TO.

Reading it tolerantly and writing it on user request is probably the way
that would help most users.

Regards,

Brar

In response to

Re: Support UTF-8 files with BOM in COPY FROM at 2011-09-26 14:44:38 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2011-09-26 19:06:41	Re: bug of recovery?
Previous Message	Peter Eisentraut	2011-09-26 18:49:16	Re: Support UTF-8 files with BOM in COPY FROM