Re: psql blows up on BOM character sequence

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: psql blows up on BOM character sequence
Date: 2014-03-21 21:28:13
Message-ID: 26038.1395437293@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Surely if it were really a major annoyance, someone would have sent code
> to fix it during the last 4 years and more since the above.

The code would probably be pretty trivial, *if* we had consensus on
what the behavior ought to be. I'm not sure if we do. People who
only use Unicode would probably like it if BOMs were unconditionally
swallowed, whether or not psql thinks the client_encoding is UTF8.
(And I seem to recall somebody even proposing that finding a BOM
be cause to switch the client_encoding to UTF8.) However, these
ideas are complete nonstarters for people who habitually use other
encodings.

The argument about SQL syntax carries no weight for me, at least --- what
about COPY data files? And I don't really want to suppose that \i can
never be used to insert a portion of a SQL command, either.

I'd be okay with swallowing a leading BOM if and only if client encoding
is UTF8. This should apply to any file psql reads, whether script or
data.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2014-03-21 21:47:14 Re: psql blows up on BOM character sequence
Previous Message Simon Riggs 2014-03-21 21:23:30 Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To: