From: | Søren Vainio <sva(at)Netpointers(dot)com> |
---|---|
To: | 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Scadinavian characters in regular expressions |
Date: | 2002-04-09 14:21:43 |
Message-ID: | 910513A5A944D5118BE900C04F67CB5A1F82C7@MAIL |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
There is obviously a problem with the scecial characters.
The query SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$';
produced FALSE on a database with ENCODING = 'LATIN1' and TRUE on a database
with ENCODING = 'UNICODE'.
Do you have a suggestion to how I can find the count of two-word strings
with ENCODING = 'UNICODE'?
Thank you
Søren Vainio
> -----Oprindelig meddelelse-----
> Fra: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sendt: 9. april 2002 15:34
> Til: Søren Vainio
> Cc: 'Andreas Joseph Krogh'; 'pgsql-sql(at)postgresql(dot)org'
> Emne: Re: [SQL] Scadinavian characters in regular expressions
>
>
> =?iso-8859-1?Q?S=F8ren_Vainio?= <sva(at)Netpointers(dot)com> writes:
> > Using \s does produce FALSE for SELECT 'oneå two three' ~
> > '^[^\s]+[\s][^\s]+$';
> > But it also produces FALSE for any two-word string ex:
> > SELECT 'one two' ~ '^[^\s]+[\s][^\s]+$'; where I would
> expect TRUE???
> > (I am using PostgreSQL 7.1.3)
>
> I do not believe that Postgres' regular expression engine
> recognizes \s
> as meaning anything except "s". See
> http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/fu
nctions-matching.html
In the above, it's even worse: the backslashes were eaten by the
string-literal parser, so what arrived at the RE engine was just
^[^s]+[s][^s]+$ ... not likely to produce what you wanted.
As for the original issue, I wonder whether you are storing the string
as UTF-8 or Latin1 encoding. I have a suspicion that the å (å
å a-ring) is actually a multibyte sequence inside the database
and for some reason Postgres isn't configured to recognize it as a
single logical character.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Roberto Mello | 2002-04-09 15:34:00 | Re: Hierarchical Queries |
Previous Message | Tom Lane | 2002-04-09 13:33:55 | Re: Scadinavian characters in regular expressions |