From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | Alex Kliukin <alexk(at)hintbits(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: regex match and special characters |
Date: | 2018-08-16 13:16:52 |
Message-ID: | d438c15e-f960-3705-459c-53b0e3f3366a@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 08/16/2018 03:59 AM, Alex Kliukin wrote:
> Hi,
>
> Here is a simple SQL statement that gives different results on PostgreSQL 9.6 and PostgreSQL 10+. The space character at the end of the string is actually U+2006 SIX-PER-EM SPACE (http://www.fileformat.info/info/unicode/char/2006/index.htm)
>
> test=# select 'abcd ' ~ 'abcd\s';
> ?column?
> ----------
> t
> (1 row)
>
> test=# select version();
> version
> -------------------------------------------------------------------------------------------------
> PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0, 64-bit
> (1 row)
>
>
> On another server (running on the same system on a different port)
>
> postgres=# select version();
> version
> -----------------------------------------------------------------------------------------------
> PostgreSQL 9.6.9 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0, 64-bit
> (1 row)
>
> postgres=# select 'abcd ' ~ 'abcd\s';
> ?column?
> ----------
> f
> (1 row)
>
> For both clusters, the client encoding is UTF8, the database encoding and collation is UTF8 and en_US.utf8 respectively, and the lc_ctype is en_US.utf8. I am accessing the databases running locally by ssh-ing first to the host.
>
> I observed similar issues with other Linux-based servers running Ubuntu, in all cases the regex resulted in true on PostgreSQL 10+ and false on earlier versions (down to 9.3). The query comes from a table check that suddenly stopped accepting rows valid in the older version during the migration. Making it select 'abcd ' ~ E'abcd\\s' doesn't modify the outcome, unsurprisingly.
>
> Is it reproducible for others here as well? Given that it is, Is there a way to make both versions behave the same?
select version();
version
------------------------------------------------------------------------------------
PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux)
4.8.5, 64-bit
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
test=# select 'abcd'||chr(2006) ~ E'abcd\s';
?column?
----------
f
(1 row)
In your example you are working on Postgres devel. Have you tried it on
Postgres 10 and/or 11?
>
> Cheers,
> Alex
>
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Vikas Sharma | 2018-08-16 13:41:31 | Copy over large data Postgresql 9.5 |
Previous Message | pavan95 | 2018-08-16 12:58:14 | Re: Copying data from a CSV file into a table dynamically |