Quick Links

Re: EOL characters and multibyte encodings

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: EOL characters and multibyte encodings
Date:	2007-06-21 22:51:13
Message-ID:	467B00E1.7070400@joeconway.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>> My first thought on fixing this issue was to simply replace all
>> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the
>> R parser. As far as I know, any instances of '\r' embedded in a
>> syntactically valid R statement must be escaped (i.e. literally the
>> characters "\" and "r"), so that should not be a problem. But I am
>> concerned about how this potentially plays against multibyte characters.
>> Is it safe to do this, or do I need to use a mb-aware replace algorithm?
>
> It's safe, because you'll be dealing with prosrc inside the backend,
> therefore using a backend-legal encoding, and those don't have any ASCII
> aliasing problems (all bytes of an MB character must have high bit set).

Great -- I wasn't sure about that.

> However I dislike doing it exactly that way because line numbers in the
> R script will all get doubled. Unless R never reports errors in terms
> of line numbers, you'd be better off to either delete the \r characters
> or replace them with spaces.

Good point. But I need to be able to deal with Apple EOLs too -- IIRC
those can be *only* '\r'. So I guess I need to do a look-ahead whenever
I run into '\r', see if it is followed by '\n', and then munge the
string accordingly.

Joe

In response to

Re: EOL characters and multibyte encodings at 2007-06-21 22:38:46 from Tom Lane

Responses

Re: EOL characters and multibyte encodings at 2007-06-22 08:33:22 from William ZHANG

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Albert Cervera Areny	2007-06-22 00:57:50	Documentation of contrib modules
Previous Message	Andrew Dunstan	2007-06-21 22:39:28	Re: EOL characters and multibyte encodings