Re: Scadinavian characters in regular expressions

From: Søren Vainio <sva(at)Netpointers(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Scadinavian characters in regular expressions
Date: 2002-04-09 14:21:43
Message-ID: 910513A5A944D5118BE900C04F67CB5A1F82C7@MAIL
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

There is obviously a problem with the scecial characters.
The query SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$';
produced FALSE on a database with ENCODING = 'LATIN1' and TRUE on a database
with ENCODING = 'UNICODE'.

Do you have a suggestion to how I can find the count of two-word strings
with ENCODING = 'UNICODE'?

Thank you
Søren Vainio

> -----Oprindelig meddelelse-----
> Fra: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sendt: 9. april 2002 15:34
> Til: Søren Vainio
> Cc: 'Andreas Joseph Krogh'; 'pgsql-sql(at)postgresql(dot)org'
> Emne: Re: [SQL] Scadinavian characters in regular expressions
>
>
> =?iso-8859-1?Q?S=F8ren_Vainio?= <sva(at)Netpointers(dot)com> writes:
> > Using \s does produce FALSE for SELECT 'oneå two three' ~
> > '^[^\s]+[\s][^\s]+$';
> > But it also produces FALSE for any two-word string ex:
> > SELECT 'one two' ~ '^[^\s]+[\s][^\s]+$'; where I would
> expect TRUE???
> > (I am using PostgreSQL 7.1.3)
>
> I do not believe that Postgres' regular expression engine
> recognizes \s
> as meaning anything except "s". See
> http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/fu
nctions-matching.html

In the above, it's even worse: the backslashes were eaten by the
string-literal parser, so what arrived at the RE engine was just
^[^s]+[s][^s]+$ ... not likely to produce what you wanted.

As for the original issue, I wonder whether you are storing the string
as UTF-8 or Latin1 encoding. I have a suspicion that the å (&#229
&aring a-ring) is actually a multibyte sequence inside the database
and for some reason Postgres isn't configured to recognize it as a
single logical character.

regards, tom lane

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Roberto Mello 2002-04-09 15:34:00 Re: Hierarchical Queries
Previous Message Tom Lane 2002-04-09 13:33:55 Re: Scadinavian characters in regular expressions