From: | Søren Vainio <sva(at)Netpointers(dot)com> |
---|---|
To: | 'Andreas Joseph Krogh' <andreak(at)officenet(dot)no>, "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Scadinavian characters in regular expressions |
Date: | 2002-04-09 11:28:38 |
Message-ID: | 910513A5A944D5118BE900C04F67CB5A1F82C6@MAIL |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Using \s does produce FALSE for SELECT 'oneå two three' ~
'^[^\s]+[\s][^\s]+$';
But it also produces FALSE for any two-word string ex:
SELECT 'one two' ~ '^[^\s]+[\s][^\s]+$'; where I would expect TRUE???
(I am using PostgreSQL 7.1.3)
> -----Oprindelig meddelelse-----
> Fra: pgsql-sql-owner(at)postgresql(dot)org
> [mailto:pgsql-sql-owner(at)postgresql(dot)org]På vegne af Andreas
> Joseph Krogh
> Sendt: 9. april 2002 11:53
> Til: 'pgsql-sql(at)postgresql(dot)org'
> Emne: Re: [SQL] Scadinavian characters in regular expressions
>
>
> On Tuesday 09 April 2002 10:51, Søren Vainio wrote:
> > Can someone please explain the following?
> > I am using a regular expression to find strings containing
> two words (begin
> > with one or more characters not being spaces followed by a
> space followed
> > by one or more characters not being spaces).
> > But when scandinavian characters are included it returns
> different results
> > depending on where the character is positioned.
> > The first two-word example returns TRUE as expected.
> > The second three-word example returns FALSE as expected.
> > But when I let an å (å å a-ring) traverse through
> the string it
> > unexpectedly returns TRUE when the character is positioned as the
> > second-last or last character in the two first words.
> >
> > SELECT 'one two' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'åone two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'oåne two three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'onåe two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'oneå two three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one åtwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one tåwo three' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one twåo three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one twoå three' ~ '^[^ ]+[ ][^ ]+$'; returns TRUE
> > SELECT 'one two åthree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two tåhree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two thåree' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two thråee' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two threåe' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> > SELECT 'one two threeå' ~ '^[^ ]+[ ][^ ]+$'; returns FALSE
> >
> > Thank you for any response.
> >
> > Søren Vainio, Denmark
>
> I just tried the following which returned false as expected:
> andreak=# SELECT 'oneå two three' ~ '^[^\s]+[\s][^\s]+$';
> ?column?
> ----------
> f
> (1 row)
>
> andreak=# select version();
> version
> -----------------------------------------------------------
> PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.96
> (1 row)
>
> NOTE: I replaced your [^ ] with the properly formated pattarn
> for whitespace:
> [^\s]
>
> --
> Andreas Joseph Krogh (Senior Software Developer)
> <andreak(at)officenet(dot)no>
> A hen is an egg's way of making another egg.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Joseph Krogh | 2002-04-09 11:49:36 | Re: Scadinavian characters in regular expressions |
Previous Message | Michael Contzen | 2002-04-09 10:15:47 | Re: How slow is DISTINCT? |