Quick Links

Re: pg_trgm

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, peter_e(at)gmx(dot)net, ishii(at)sraoss(dot)co(dot)jp, andres(at)anarazel(dot)de, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru
Subject:	Re: pg_trgm
Date:	2010-05-29 14:09:12
Message-ID:	AANLkTinC6LcLpF16rREFJpOa1q1CzAN8tJAooqXwmalR@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, May 29, 2010 at 9:13 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> ! #define iswordchr(c) (lc_ctype_is_c()? \
> ! ((*(c) & 0x80)? !t_isspace(c) : (t_isalpha(c) || t_isdigit(c))) : \
>

Surely isspace(c) will always be false for non-ascii characters in C locale?

Now it might be sensible to just treat any non-ascii character as a
word-character in addition to alpha and digits, so what might make
sense is

t_isalpha(c) || t_isdigit(c)) || (lc_ctype_is_c() && *(c)&0x80)

Though I wonder whether it wouldn't be generally more useful to users
to provide the non-space version as an option. I could see that being
useful for people in other circumstances aside from working around
this locale problem.

--
greg

In response to

Re: pg_trgm at 2010-05-29 08:13:28 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2010-05-29 14:31:06	Re: pg_trgm
Previous Message	Jan Urbański	2010-05-29 13:56:57	Re: tsvector pg_stats seems quite a bit off.