From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Joe Conway <mail(at)joeconway(dot)com> |
Cc: | "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: EOL characters and multibyte encodings |
Date: | 2007-06-21 22:38:46 |
Message-ID: | 8064.1182465526@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Joe Conway <mail(at)joeconway(dot)com> writes:
> I finally was able PL/R to compile and run on Windows recently. This has
> lead to people using a Windows based client (typically PgAdmin III) to
> create PL/R functions. Immediately I started to receive reports of
> failures that turned out to be due to the carriage return (\r) used in
> standard Win32 EOLs (\r\n). It seems that the R parser only accepts
> newlines (\n), even on Win32 (confirmed on r-devel list with a core
> developer).
> My first thought on fixing this issue was to simply replace all
> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the
> R parser. As far as I know, any instances of '\r' embedded in a
> syntactically valid R statement must be escaped (i.e. literally the
> characters "\" and "r"), so that should not be a problem. But I am
> concerned about how this potentially plays against multibyte characters.
> Is it safe to do this, or do I need to use a mb-aware replace algorithm?
It's safe, because you'll be dealing with prosrc inside the backend,
therefore using a backend-legal encoding, and those don't have any ASCII
aliasing problems (all bytes of an MB character must have high bit set).
However I dislike doing it exactly that way because line numbers in the
R script will all get doubled. Unless R never reports errors in terms
of line numbers, you'd be better off to either delete the \r characters
or replace them with spaces.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2007-06-21 22:39:28 | Re: EOL characters and multibyte encodings |
Previous Message | Joe Conway | 2007-06-21 22:27:17 | EOL characters and multibyte encodings |