Re: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13

From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Date: 2013-03-22 14:23:42
Message-ID: CAADeyWgrVpRsG6baR1_oPGJWRNN3bWiCNofKKG0LTf3PAiH3Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

unfortunately octal doesn't seem to work either -

On Tue, Mar 19, 2013 at 7:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Farber <alexander(dot)farber(at)gmail(dot)com> writes:
>> # select 'АБВГД' ~ '^[\u0410-\u042F]{2,}$';
>> WARNING: nonstandard use of escape in a string literal
>
> I think Unicode escapes were introduced in 9.0. In 8.4 you'd probably
> have to write out the UTF8 equivalent as octal escapes :-(

# select 'АБВГД' ~ '^[\2020-\2057]{2,}$';
WARNING: nonstandard use of escape in a string literal
LINE 1: select 'АБВГД' ~ '^[\2020-\2057]{2,}$';
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid byte sequence for encoding "UTF8": 0x82
HINT: This error can also happen if the byte sequence does not
match the encoding expected by the server, which is controlled by
"client_encoding".

But writing out UTF8 equivalents seems to work
(trying to detect capitalized Russian letters as per
http://www.unicode.org/charts/PDF/U0400.pdf ):

# select 'АБВГД' ~ '^[А-Я]{2,}$';
?column?
----------
t
(1 row)

And then I try to solve my 2nd problem (detecting 3
letters in a row, a rare case in Russian language):

# select 'ОШИБББКА' ~ '(.)\1\1';
WARNING: nonstandard use of escape in a string literal
LINE 1: select 'ОШИБББКА' ~ '(.)\1\1';
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
?column?
----------
f
(1 row)

Does anybody please know why this fails in 8.4.13?

According to the table 9-18 in
http://www.postgresql.org/docs/8.4/static/functions-matching.html
it should be ok to use \1 for referencing
parts captured by round brackets?

Regards
Alex

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bertrand Janin 2013-03-22 14:50:39 Re: Rewritten rows on unchanged values
Previous Message Hannes Erven 2013-03-22 14:15:59 Re: Rewritten rows on unchanged values