Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13

From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Date: 2013-03-20 13:47:01
Message-ID: CAADeyWhm3q67_J2=ED-9X3MYedHPSj6H7i99uADwdmFCXxv09g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for trying! I am using CentOS 6.3

It seems to be better in 9.2.x?

Unfortunately I'd like to stay with 8.4.x for now
(because I use the PostgreSQL instance
with other projects at the same host)....

Regards
Alex

On Wed, Mar 20, 2013 at 10:35 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:
> Alexander Farber wrote:
>> I have prepared an SQL fiddle for my question:
>> http://sqlfiddle.com/#!11/8a494/4
>>
> Strange, it works here (RHEL 6, x86_64, PostgreSQL 9.2.2,
> encoding "UTF8", collation and ctype "de_DE.UTF8"):
>
> test=> SELECT 'ПРОВЕРКА' ~ '^[\u0410-\u042F]{2,}$';
> ?column?
> ----------
> t
> (1 row)
>
> test=> SELECT 'ABCDE' ~ '^[\u0410-\u042F]{2,}$';
> ?column?
> ----------
> f
> (1 row)
>
>> create table good_words (
>> word varchar(64) primary key
>> );
>>
>> create or replace function keep_clean() returns trigger as $body$
>> begin
>> new.word := upper(new.word);
>>
>> /* next line does not compile? */
>> IF new.word !~ '^[\x0410-\x042F]{2,}$' THEN
>> RAISE EXCEPTION 'Not an uppercased Russian word in UTF8';
>> END IF;
>>
>> IF new.word ~ '^[ЪЫЬ]' OR new.word ~ 'Ъ$' THEN
>> return NULL;
>> END IF;
>>
>> /* does not return NULL for 'ошибббка'? */
>> IF new.word ~ '(.)\1\1' AND new.word NOT LIKE '%ШЕЕЕ%'
>> AND new.word NOT LIKE '%ЗМЕЕЕ%' THEN
>> return NULL;
>
> This works for me as well:
>
> test=> SELECT 'ошибббка' ~ '(.)\1\1'
> AND 'ошибббка' NOT LIKE '%ШЕЕЕ%'
> AND 'ошибббка' NOT LIKE '%ЗМЕЕЕ%';
> ?column?
> ----------
> t
> (1 row)
>
> test=> SELECT 'ошиббка' ~ '(.)\1\1'
> AND 'ошиббка' NOT LIKE '%ШЕЕЕ%'
> AND 'ошиббка' NOT LIKE '%ЗМЕЕЕ%';
> ?column?
> ----------
> f
> (1 row)
>
>> END IF;
>>
>> return new;
>> end;
>> $body$ language plpgsql;
>
> What do you get for
>
> SELECT pg_encoding_to_char(encoding),
> datcollate,
> datctype
> FROM pg_database WHERE datname = current_database();
>
> and for
>
> SHOW client_encoding;
>
> Yours,
> Laurenz Albe
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message jg 2013-03-20 13:53:39 Re: File Fragmentation
Previous Message Vick Khera 2013-03-20 13:46:29 Re: File Fragmentation