Re: Queries with Regular Expressions

From: "John D(dot) Burger" <john(at)mitre(dot)org>
To: PostgreSQL-general general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Queries with Regular Expressions
Date: 2006-04-06 20:13:01
Message-ID: eba60a0256eed477b6be464374a7594c@mitre.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> But I just can't make it work correctly using brackets:
> SELECT field FROM table WHERE field ~* 'ch[aã]o';
>
> It just returns tuples that have 'chao', but not 'chão'.
>
> My queries are utf-8 an the database is SQL_ASCII.

I suspect the bracketed expression is turning into [aXY], where XY is
the two-byte sequence corresponding to ã in UTF8. So the regular
expression is only going to match strings of the form chao, chXo and
chYo. To make sure that this is what's happening, try this:

select length('ã');

I bet you get back 2, not 1. I don't know if a UTF8 database will
handle this correctly or not. The safest thing to do may be to use
queries like this:

SELECT field FROM table WHERE field ~* 'ch(a|ã)o';

- John D. Burger
MITRE

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Ribe 2006-04-06 20:26:27 Re: "Upcalls" (sort of) from the database
Previous Message David Gama Rodrí­guez 2006-04-06 20:10:47 %Re: % tsearch gendict