Re: psql blows up on BOM character sequence

From: Jim Nasby <jim(at)nasby(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: psql blows up on BOM character sequence
Date: 2014-03-24 21:37:22
Message-ID: 5330A592.2080906@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/24/14, 1:59 PM, Andrew Dunstan wrote:
>> It occurs to me that we're going about this the wrong way...
>>
>> The error here isn't being generated by psql; it's generated by the backend. In the context of a statement (and not, say, a COPY command).
>>
>> So instead of trying to handle this on the psql side[1], I think we need to handle it in the backend; specifically in the parser. Is there an easy way to get the parser to ignore the BOM character in the context of commands (but not in strings)?
>>
>> [1]: Obviously, BOM could still screw up a psql command like \d. We'd want to address that as well; but I suspect backends are the more common scenario.
>
>
> But what about COPY files? I don't see why there is less of a case for eating a leading BOM for a COPY file (or COPY stdin for that matter, given that it can come from \copy) than for an SQL file.

Wait... I thought that was one of the objections... that we wanted to leave a BOM in something like a COPY untouched? If that's not the case, why not just strip BOM wherever it shows up in psql input? (Granted, not good for \copy or copy; performance, so we might want to special case those, but that doesn't seem unreasonable...)

> I suspect suspect trying to do this in the parser will be quite messy. This needs to happen before the input is converted to the server encoding, I think.

My hope was that there's a point in the parser where we know whether we're dealing with a command strong or raw data, and that we'd be able to only strip this from command strings... or better yet, get the code that looks for a command string to simply ignore BOM when it's parsing.

Uh... could we just treat BOM as another whitespace character? ISTM the case is basically the same: we don't want " INSERT ... VALUES( ' extra spaces ' ) ; " to blow up because of extra white space, but obviously ' extra spaces ' needs to stay intact
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-03-24 23:05:19 Re: psql blows up on BOM character sequence
Previous Message Josh Berkus 2014-03-24 20:55:30 Re: [HACKERS] First draft of update announcement