| From: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> | 
|---|---|
| To: | "Alexander Farber *EXTERN*" <alexander(dot)farber(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org> | 
| Subject: | Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13 | 
| Date: | 2013-03-20 09:35:04 | 
| Message-ID: | A737B7A37273E048B164557ADEF4A58B057C9A5F@ntex2010a.host.magwien.gv.at | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Alexander Farber wrote:
> I have prepared an SQL fiddle for my question:
> http://sqlfiddle.com/#!11/8a494/4
> 
> And also described it in more detail at
> http://stackoverflow.com/questions/15500270/string-matching-in-insert-trigger-how-to-use-in-
> conditionals-to-return-null
> 
> Does anybody please know how to check for
> UTF8 range \x0410-\x042F in my code below?
> 
> I've tried both
> new.word !~ '^[\x0410-\x042F]{2,}$'
> (fails with syntax error) and
> new.word !~ '^[\u0410-\u042F]{2,}$'
> (triggers even for correct words):
Strange, it works here (RHEL 6, x86_64, PostgreSQL 9.2.2,
encoding "UTF8", collation and ctype "de_DE.UTF8"):
test=> SELECT 'ПРОВЕРКА' ~ '^[\u0410-\u042F]{2,}$';
 ?column?
----------
 t
(1 row)
test=> SELECT 'ABCDE' ~ '^[\u0410-\u042F]{2,}$';
 ?column?
----------
 f
(1 row)
> create table good_words (
>         word varchar(64) primary key
> );
> 
> create or replace function keep_clean() returns trigger as $body$
>         begin
>                 new.word := upper(new.word);
> 
>                 /* next line does not compile? */
>                 IF new.word !~ '^[\x0410-\x042F]{2,}$' THEN
>                     RAISE EXCEPTION 'Not an uppercased Russian word in UTF8';
>                 END IF;
> 
>                 IF new.word ~ '^[ЪЫЬ]' OR new.word ~ 'Ъ$' THEN
>                     return NULL;
>                 END IF;
> 
>                 /* does not return NULL for 'ошибббка'? */
>                 IF new.word ~ '(.)\1\1' AND new.word NOT LIKE '%ШЕЕЕ%'
> AND new.word NOT LIKE '%ЗМЕЕЕ%' THEN
>                     return NULL;
This works for me as well:
test=> SELECT 'ошибббка' ~ '(.)\1\1'
          AND 'ошибббка' NOT LIKE '%ШЕЕЕ%'
          AND 'ошибббка' NOT LIKE '%ЗМЕЕЕ%';
 ?column?
----------
 t
(1 row)
test=> SELECT 'ошиббка' ~ '(.)\1\1'
          AND 'ошиббка' NOT LIKE '%ШЕЕЕ%'
          AND 'ошиббка' NOT LIKE '%ЗМЕЕЕ%';
 ?column?
----------
 f
(1 row)
>                 END IF;
> 
>                 return new;
>         end;
> $body$ language plpgsql;
What do you get for
SELECT pg_encoding_to_char(encoding),
       datcollate,
       datctype
FROM pg_database WHERE datname = current_database();
and for
SHOW client_encoding;
Yours,
Laurenz Albe
| From | Date | Subject | |
|---|---|---|---|
| Next Message | 卢瑛 | 2013-03-20 10:09:21 | Npgsql Integrated Authentication Problem | 
| Previous Message | Albe Laurenz | 2013-03-20 08:46:49 | Re: Alphanumeric natural order sorting |