Re: Scadinavian characters in regular expressions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Søren Vainio <sva(at)Netpointers(dot)com>
Cc: "'Andreas Joseph Krogh'" <andreak(at)officenet(dot)no>, "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Scadinavian characters in regular expressions
Date: 2002-04-09 13:33:55
Message-ID: 28561.1018359235@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

=?iso-8859-1?Q?S=F8ren_Vainio?= <sva(at)Netpointers(dot)com> writes:
> Using \s does produce FALSE for SELECT 'one two three' ~
> '^[^\s]+[\s][^\s]+$';
> But it also produces FALSE for any two-word string ex:
> SELECT 'one two' ~ '^[^\s]+[\s][^\s]+$'; where I would expect TRUE???
> (I am using PostgreSQL 7.1.3)

I do not believe that Postgres' regular expression engine recognizes \s
as meaning anything except "s". See
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/functions-matching.html

In the above, it's even worse: the backslashes were eaten by the
string-literal parser, so what arrived at the RE engine was just
^[^s]+[s][^s]+$ ... not likely to produce what you wanted.

As for the original issue, I wonder whether you are storing the string
as UTF-8 or Latin1 encoding. I have a suspicion that the (&#229
&aring a-ring) is actually a multibyte sequence inside the database
and for some reason Postgres isn't configured to recognize it as a
single logical character.

regards, tom lane

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Søren Vainio 2002-04-09 14:21:43 Re: Scadinavian characters in regular expressions
Previous Message Gautham S. Rao 2002-04-09 13:04:29 Hierarchical Queries