Re: Using psql -f to load a UTF8 file

From: Chris Angelico <rosuav(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Using psql -f to load a UTF8 file
Date: 2012-09-21 03:07:11
Message-ID: CAPTjJmrx3Njx30=F9indfZZ5_8v5xfWsWZqD2aLiLLXmu78O_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Sep 21, 2012 at 11:21 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
> I strongly disagree. The BOM provides a useful and standard way to
> differentiate UTF-8 encoded text files from the random pile of encodings
> that any given file could be.

The only reliable way to ascertain the encoding of a hunk of data is
with something out-of-band. Relying on the first three bytes being
\xEF\xBB\xBF is not much more reliable than detecting based on octet
frequency, which is what leads to the "Bush hid the facts" hack in
Notepad. This is why many Internet protocols have metadata carried
along with the file (eg Content-type in HTTP), rather than relying on
internal evidence.

> psql should accept UTF-8 with BOM.

However, this I would agree with. It's cheap enough to detect, and
aside from arbitrarily trying to kill Notepad (which won't happen
anyway), there's not a lot of reason to choke on the BOM. But it's not
a big deal.

ChrisA

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Travers 2012-09-21 04:00:21 Re: foreign key from array element
Previous Message Craig Ringer 2012-09-21 01:21:36 Re: Using psql -f to load a UTF8 file