Quick Links

Re: Unicode string literals versus the world

From:	Marko Kreen <markokr(at)gmail(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Unicode string literals versus the world
Date:	2009-04-14 11:38:38
Message-ID:	e51f66da0904140438p599d8debj17114a0976295a13@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 4/14/09, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
> > It gets worse though: I have seldom seen such a badly designed piece of
> > syntax as the Unicode string syntax --- see
> > http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL
> >-SYNTAX-STRINGS-UESCAPE
> >
> > You scan the string, and then after that they tell you what the escape
> > character is!? Not to mention the obvious ambiguity with & as an
> > operator.
> >
> > If we let this go into 8.4, our previous rounds with security holes
> > caused by careless string parsing will look like a day at the beach.
> > No frontend that isn't fully cognizant of the Unicode string syntax is
> > going to parse such things correctly --- it's going to be trivial for
> > a bad guy to confuse a quoting mechanism as to what's an escape and what
> > isn't.
>
>
> Note that the escape character marks the Unicode escapes; it doesn't affect the
> quote characters that delimit the string. So offhand I can't see any potential
> for quote confusion/SQL injection type problems. Please elaborate if you see
> a problem.
>
> If there are problems, we could consider getting rid of the UESCAPE clause.
> Without it, the U&'' strings would behave much like the E'' strings. But I'd
> like to understand the problem first.

I think the problem is that they should not act like E'' strings, but they
should act like plain '' strings - they should follow stdstr setting.

That way existing tools that may (or may not..) understand E'' and stdstr
settings, but definitely have not heard about U&'' strings can still
parse the SQL without new surprises.

If they already act that way then keeping U& should be fine.

And if UESCAPE does not affect main string parsing, but is handled in
second pass going over parsed string - like bytea \ - then that should
also be fine and should not cause any new surprises.

But if not, it must go.

I would prefer that such quoting extensions would wait until
stdstr=on setting is the only mode Postgres will operate.
Fitting new quoting ways to environment with flippable stdstr setting
will be rather painful for everyone.

I still stand on my proposal, how about extending E'' strings with
unicode escapes (eg. \uXXXX)? The E'' strings are already more
clearly defined than '' and they are our "own", we don't need to
consider random standards, but can consider our sanity.

--
marko

In response to

Re: Unicode string literals versus the world at 2009-04-14 10:52:37 from Peter Eisentraut

Responses

Re: Unicode string literals versus the world at 2009-04-14 12:10:54 from Andrew Dunstan
Re: Unicode string literals versus the world at 2009-04-14 12:53:52 from Peter Eisentraut
Re: Unicode string literals versus the world at 2009-04-14 15:54:33 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zdenek Kotala	2009-04-14 11:54:30	libpq is not thread safe
Previous Message	Peter Eisentraut	2009-04-14 10:56:58	Re: Unicode string literals versus the world