From: | Andreas Joseph Krogh <andreak(at)officenet(dot)no> |
---|---|
To: | "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Scadinavian characters in regular expressions |
Date: | 2002-04-09 09:53:29 |
Message-ID: | 200204091153.29935.andreak@officenet.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On Tuesday 09 April 2002 10:51, Søren Vainio wrote:
> Can someone please explain the following?
> I am using a regular expression to find strings containing two words (begin
> with one or more characters not being spaces followed by a space followed
> by one or more characters not being spaces).
> But when scandinavian characters are included it returns different results
> depending on where the character is positioned.
> The first two-word example returns TRUE as expected.
> The second three-word example returns FALSE as expected.
> But when I let an å (å å a-ring) traverse through the string it
> unexpectedly returns TRUE when the character is positioned as the
> second-last or last character in the two first words.
>
> SELECT 'one two' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> SELECT 'one two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'åone two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'oåne two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'onåe two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> SELECT 'one åtwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'one tåwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'one twåo three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> SELECT 'one twoå three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> SELECT 'one two åthree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'one two tåhree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'one two thåree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'one two thråee' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'one two threåe' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> SELECT 'one two threeå' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
>
> Thank you for any response.
>
> Søren Vainio, Denmark
I just tried the following which returned false as expected:
andreak=# SELECT 'oneå two three' ~ '^[^\s]+[\s][^\s]+$';
?column?
----------
f
(1 row)
andreak=# select version();
version
-----------------------------------------------------------
PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)
NOTE: I replaced your [^ ] with the properly formated pattarn for whitespace:
[^\s]
--
Andreas Joseph Krogh (Senior Software Developer) <andreak(at)officenet(dot)no>
A hen is an egg's way of making another egg.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Contzen | 2002-04-09 10:15:47 | Re: How slow is DISTINCT? |
Previous Message | Søren Vainio | 2002-04-09 08:51:33 | Scadinavian characters in regular expressions |