From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, peter_e(at)gmx(dot)net, ishii(at)sraoss(dot)co(dot)jp, andres(at)anarazel(dot)de, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru |
Subject: | Re: pg_trgm |
Date: | 2010-05-29 14:09:12 |
Message-ID: | AANLkTinC6LcLpF16rREFJpOa1q1CzAN8tJAooqXwmalR@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, May 29, 2010 at 9:13 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> ! #define iswordchr(c) (lc_ctype_is_c()? \
> ! ((*(c) & 0x80)? !t_isspace(c) : (t_isalpha(c) || t_isdigit(c))) : \
>
Surely isspace(c) will always be false for non-ascii characters in C locale?
Now it might be sensible to just treat any non-ascii character as a
word-character in addition to alpha and digits, so what might make
sense is
t_isalpha(c) || t_isdigit(c)) || (lc_ctype_is_c() && *(c)&0x80)
Though I wonder whether it wouldn't be generally more useful to users
to provide the non-space version as an option. I could see that being
useful for people in other circumstances aside from working around
this locale problem.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-05-29 14:31:06 | Re: pg_trgm |
Previous Message | Jan Urbański | 2010-05-29 13:56:57 | Re: tsvector pg_stats seems quite a bit off. |