From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Marko Kreen <markokr(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unicode string literals versus the world |
Date: | 2009-04-14 15:54:33 |
Message-ID: | 12063.1239724473@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Marko Kreen <markokr(at)gmail(dot)com> writes:
> I would prefer that such quoting extensions would wait until
> stdstr=on setting is the only mode Postgres will operate.
> Fitting new quoting ways to environment with flippable stdstr setting
> will be rather painful for everyone.
It would certainly be a lot safer to wait until non-standard-conforming
strings don't exist anymore. The problem is that that may never happen,
and is certainly not on the roadmap to happen in the foreseeable future.
> I still stand on my proposal, how about extending E'' strings with
> unicode escapes (eg. \uXXXX)? The E'' strings are already more
> clearly defined than '' and they are our "own", we don't need to
> consider random standards, but can consider our sanity.
That's one way we could proceed. The other proposal that seemed
attractive to me was a decode-like function:
uescape('foo\00e9bar')
uescape('foo\00e9bar', '\')
(double all the backslashes if you assume not
standard_conforming_strings). The arguments in favor of this one
are (1) you can apply it to the result of an expression, it's not
strictly tied to literals; and (2) it's a lot lower-footprint solution
since it doesn't affect basic literal handling. If you wish to suppose
that this is only a stopgap until someday when we can implement the SQL
standard syntax more safely, then low footprint is good. One could
even imagine back-porting this into existing releases as a user-defined
function.
The solution with \u in extended literals is probably workable too.
I'm slightly worried about the possibility of issues with code that
thinks it knows what an E-literal means but doesn't really. In
particular something might think it knows that "\u" just means "u",
and proceed to strip the backslash. I don't see a path for that to
become a security hole though, only a garden-variety bug. So I could
live with that one on the grounds of being easier to use (which it
would be, because of less typing compared to uescape()).
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-04-14 16:17:45 | Re: proposal: add columns created and altered to pg_proc and pg_class |
Previous Message | Greg Stark | 2009-04-14 15:49:45 | Re: Unicode support |