Re: EOL characters and multibyte encodings

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: EOL characters and multibyte encodings
Date: 2007-06-21 22:38:46
Message-ID: 8064.1182465526@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> I finally was able PL/R to compile and run on Windows recently. This has
> lead to people using a Windows based client (typically PgAdmin III) to
> create PL/R functions. Immediately I started to receive reports of
> failures that turned out to be due to the carriage return (\r) used in
> standard Win32 EOLs (\r\n). It seems that the R parser only accepts
> newlines (\n), even on Win32 (confirmed on r-devel list with a core
> developer).

> My first thought on fixing this issue was to simply replace all
> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the
> R parser. As far as I know, any instances of '\r' embedded in a
> syntactically valid R statement must be escaped (i.e. literally the
> characters "\" and "r"), so that should not be a problem. But I am
> concerned about how this potentially plays against multibyte characters.
> Is it safe to do this, or do I need to use a mb-aware replace algorithm?

It's safe, because you'll be dealing with prosrc inside the backend,
therefore using a backend-legal encoding, and those don't have any ASCII
aliasing problems (all bytes of an MB character must have high bit set).

However I dislike doing it exactly that way because line numbers in the
R script will all get doubled. Unless R never reports errors in terms
of line numbers, you'd be better off to either delete the \r characters
or replace them with spaces.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-06-21 22:39:28 Re: EOL characters and multibyte encodings
Previous Message Joe Conway 2007-06-21 22:27:17 EOL characters and multibyte encodings