From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: UTF8MatchText |
Date: | 2007-05-18 03:06:05 |
Message-ID: | 464D181D.9010307@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Tom Lane wrote:
> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
>
>> Yes, I only used the 'disjoint representations for first-bytes and
>> not-first-bytes of MB characters' feature in UTF8. Other encodings
>> allows both [AB] and [BA] for MB character patterns. UTF8Match() does
>> not cope with those encodings; If we have '[AB][AB]' in a table and
>> search it with LIKE '%[BA]%', we judge that they are matched by mistake.
>>
>
> AFAICS, the patch does *not* make that mistake because % will not
> advance over a fractional character.
>
>
Yeah, I think that's right.
Attached is my current WIP patch. If we decide that this optimisation
can in fact be applied to all backend encodings, that will be easily
incorporated. It will simplify the code further. Note that all the
common code in the MatchText and do_like_escape functions has been
factored - and the bytea functions just call the single-byte text
versions - AFAICS the effect will be identical to having the specialised
versions. (I'm always happy when code volume can be reduced.)
cheers
andrew
Attachment | Content-Type | Size |
---|---|---|
utf8.patch | text/x-patch | 24.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-05-18 03:31:22 | Re: UTF8MatchText |
Previous Message | Greg Smith | 2007-05-18 03:02:31 | Re: Not ready for 8.3 |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-05-18 03:31:22 | Re: UTF8MatchText |
Previous Message | Tom Lane | 2007-05-18 02:35:06 | Re: UTF8MatchText |