From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: UTF8MatchText |
Date: | 2007-05-17 19:57:25 |
Message-ID: | 464CB3A5.9020600@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> Except that the entire point of this patch is to dumb down NextChar to
>>> be the same as NextByte for UTF8 strings.
>>>
>
>
>> That's not what I see in (what I think is) the latest submission, which
>> includes this snippet:
>>
>
> [ scratches head... ] OK, then I think I totally missed what this patch
> is trying to accomplish; because this code looks just the same as the
> existing multibyte-character operations. Where does the performance
> improvement come from?
>
>
>
That's what bothered me. The trouble is that we have so much code that
looks *almost* identical.
From my WIP patch, here's where the difference appears to be - note
that UTF8 branch has two NextByte calls at the bottom, unlike the other
branch:
#ifdef UTF8_OPT
/*
* UTF8 is optimised to do byte at a time matching in most cases,
* thus saving expensive calls to NextChar.
*
* UTF8 has disjoint representations for first-bytes and
* not-first-bytes of MB characters, and thus it is
* impossible to make a false match in which an MB pattern
* character is matched to the end of one data character
* plus the start of another.
* In character sets without that property, we have to use the
* slow way to ensure we don't make out-of-sync matches.
*/
else if (*p == '_')
{
NextChar(t, tlen);
NextByte(p, plen);
continue;
}
else if (!BYTEEQ(t, p))
{
/*
* Not the single-character wildcard and no explicit match? Then
* time to quit...
*/
return LIKE_FALSE;
}
NextByte(t, tlen);
NextByte(p, plen);
#else
/*
* Branch for non-utf8 multi-byte charsets and also for single-byte
* charsets which don't gain any benefit from the above
optimisation.
*/
else if ((*p != '_') && !CHAREQ(t, p))
{
/*
* Not the single-character wildcard and no explicit match? Then
* time to quit...
*/
return LIKE_FALSE;
}
NextChar(t, tlen);
NextChar(p, plen);
#endif /* UTF8_OPT */
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-05-17 20:04:32 | Re: CREATE TABLE LIKE INCLUDING INDEXES support |
Previous Message | Marc G. Fournier | 2007-05-17 19:49:24 | Re: 8.3 release date on web site |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-05-17 20:04:32 | Re: CREATE TABLE LIKE INCLUDING INDEXES support |
Previous Message | Tom Lane | 2007-05-17 19:48:47 | Re: UTF8MatchText |