Quick Links

Re: pg should ignore u+200b zero width space

From:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	James Cloos <cloos(at)jhcloos(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: pg should ignore u+200b zero width space
Date:	2020-11-11 14:56:50
Message-ID:	739c79f3-ed87-f584-b209-d17318f7fdfe@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 2020-11-03 15:52, Tom Lane wrote:
> The difficulty with doing anything in this space --- whether it be
> ignoring, throwing an error, or whatever --- is that it makes the
> lexer's behavior encoding-sensitive and potentially locale-sensitive.
> That's problematic for all sorts of reasons. One of the worst is
> that frontend programs such as psql and ecpg also have SQL lexers,
> and there'd be no way to keep their behavior in precise sync with
> the backend's. (They might not even be running in the same encoding,
> never mind locale.) It might even be possible to build security
> holes around that; recall the fun we've had with trying to lock
> down quoting rules in encodings where backslash can be part of a
> multibyte character :-(.

I think we could support whitespace properly if we're using Unicode, and
use ASCII whitespace for all other encodings. The combinations to deal
with are finite. But it is a tricky problem that needs to be approached
with care.

In response to

Re: pg should ignore u+200b zero width space at 2020-11-03 14:52:47 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tom Lane	2020-11-11 15:25:51	Re: REL_13_STABLE Windows 10 Regression Failures
Previous Message	Pavel Borisov	2020-11-11 14:13:13	Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted