Scadinavian characters in regular expressions

From: Søren Vainio <sva(at)Netpointers(dot)com>
To: "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org>
Subject: Scadinavian characters in regular expressions
Date: 2002-04-09 08:51:33
Message-ID: 910513A5A944D5118BE900C04F67CB5A1F82C5@MAIL
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Can someone please explain the following?
I am using a regular expression to find strings containing two words (begin
with one or more characters not being spaces followed by a space followed by
one or more characters not being spaces).
But when scandinavian characters are included it returns different results
depending on where the character is positioned.
The first two-word example returns TRUE as expected.
The second three-word example returns FALSE as expected.
But when I let an å (&#229 &aring a-ring) traverse through the string it
unexpectedly returns TRUE when the character is positioned as the
second-last or last character in the two first words.

SELECT 'one two' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
SELECT 'one two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'åone two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'oåne two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'onåe two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
SELECT 'one åtwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'one tåwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'one twåo three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
SELECT 'one twoå three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
SELECT 'one two åthree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'one two tåhree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'one two thåree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'one two thråee' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'one two threåe' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
SELECT 'one two threeå' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE

Thank you for any response.

Søren Vainio, Denmark

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andreas Joseph Krogh 2002-04-09 09:53:29 Re: Scadinavian characters in regular expressions
Previous Message Tom Lane 2002-04-08 15:21:57 Re: JOINS and non use of indexes