From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org, Giorgio Valoti <giorgio_v(at)mac(dot)com> |
Subject: | tsearch is non-multibyte-aware in a few places |
Date: | 2008-06-19 16:29:11 |
Message-ID: | 15580.1213892951@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've identified the cause of bug #4253:
/* Trim trailing space */
while (*pbuf && !t_isspace(pbuf))
pbuf++;
*pbuf = '\0';
At least on Macs, t_isspace is capable of returning "true" when pointed
at the second byte of a 2-byte UTF8 character. This explains the report
that the letter "" has a problem when some other ones don't. Of
course pbuf needs to be incremented using pg_mblen not just ++.
I looked around for other occurrences of the same problem and found
a couple. I also found occurrences of the same pattern for skipping
whitespace:
while (*s && t_isspace(s))
s++;
This is safe if and only if t_isspace is never true for multibyte
characters ... can anyone think of a counterexample?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-06-19 17:23:36 | Re: tsearch is non-multibyte-aware in a few places |
Previous Message | Alvaro Herrera | 2008-06-19 15:26:13 | Re: Backend Stats Enhancement Request |