Sv: Re: regex match and special characters

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Sv: Re: regex match and special characters
Date: 2018-08-16 14:04:34
Message-ID: VisenaEmail.3b.faeb5a9ceebe8f3f.165430c040f@tc7-visena
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

På torsdag 16. august 2018 kl. 15:16:52, skrev Adrian Klaver <
adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>>:
On 08/16/2018 03:59 AM, Alex Kliukin wrote:
> Hi,
>
> Here is a simple SQL statement that gives different results on PostgreSQL
9.6 and PostgreSQL 10+. The space character at the end of the string is
actually U+2006 SIX-PER-EM SPACE
(http://www.fileformat.info/info/unicode/char/2006/index.htm)
>
> test=# select 'abcd ' ~ 'abcd\s';
>   ?column?
> ----------
>   t
> (1 row)
>
> test=# select version();
>                                               version
>
-------------------------------------------------------------------------------------------------
>   PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (Gentoo
6.4.0-r1 p1.3) 6.4.0, 64-bit
> (1 row)
>
>
> On another server (running on the same system on a different port)
>
> postgres=# select version();
>                                              version
>
-----------------------------------------------------------------------------------------------
>   PostgreSQL 9.6.9 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0-r1
p1.3) 6.4.0, 64-bit
> (1 row)
>
> postgres=# select 'abcd ' ~ 'abcd\s';
>   ?column?
> ----------
>   f
> (1 row)
>
> For both clusters, the client encoding is UTF8, the database encoding and
collation is UTF8 and en_US.utf8 respectively, and the lc_ctype is en_US.utf8.
I am accessing the databases running locally by ssh-ing first to the host.
>
> I observed similar issues with other Linux-based servers running Ubuntu, in
all cases the regex resulted in true on PostgreSQL 10+ and false on earlier
versions (down to 9.3). The query comes from a table check that suddenly
stopped accepting rows valid in the older version during the migration. Making
it  select 'abcd ' ~ E'abcd\\s' doesn't  modify the outcome, unsurprisingly.
>
> Is it reproducible for others here as well? Given that it is, Is there a
way to make both versions behave the same?

select version();
                                       version


------------------------------------------------------------------------------------
  PostgreSQL 10.5 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux)
4.8.5, 64-bit

lc_collate                          | en_US.UTF-8

lc_ctype                            | en_US.UTF-8

test=# select 'abcd'||chr(2006) ~ E'abcd\s';
  ?column?
----------
  f
(1 row)

In your example you are working on Postgres devel. Have you tried it on
Postgres 10 and/or 11?
 
char(2006) produces the wrong character as 2006 is the hex-value. You have to
use 8198:
 
andreak(at)[local]:5433 10.4 andreak=# select version();

┌────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            version
                                            │

├────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.3.0-16ubuntu3) 7.3.0, 64-bit │

└────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 row)

andreak(at)[local]:5433 10.4 andreak=# select 'abcd'||chr(8198) ~ 'abcd\s';
┌──────────┐
│ ?column? │
├──────────┤
│ t        │
└──────────┘
(1 row)
 
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

 

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2018-08-16 14:32:40 Re: Sv: Re: regex match and special characters
Previous Message Andreas Kretschmer 2018-08-16 14:00:31 Re: Copy over large data Postgresql 9.5