Quick Links

Re: UTF8MatchText

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: UTF8MatchText
Date:	2007-05-20 14:21:58
Message-ID:	46505986.1020005@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

oops. patch attached this time

Andrew Dunstan wrote:
>
>
> I wrote:
>>
>>
>>>
>>> It is only when you have a pattern like '%_' when this is a problem
>>> and we could detect this and do byte by byte when it's not. Now we
>>> check (*p == '\\') || (*p == '_') in each iteration when we scan
>>> over characters for '%', and we could do it once and have different
>>> loops for the two cases.
>>>
>>> Other than this part that I think can be optimized I don't see
>>> anything wrong with the idea behind the patch. To make the '%' case
>>> fast might be an important optimization for a lot of use cases. It's
>>> not uncommon that '%' matches a bigger part of the string than the
>>> rest of the pattern.
>>>
>>
>>
>> Are you sure? The big remaining char-matching bottleneck will surely
>> be in the code that scans for a place to start matching a %. But
>> that's exactly where we can't use byte matching for cases where the
>> charset might include AB and BA as characters - the pattern might
>> contain %BA and the string AB. However, this isn't a danger for UTF8,
>> which leads me to think that we do indeed need a special case for
>> UTF8, but for a different improvement from that proposed in the
>> original patch. I'll post an updated patch shortly.
>>
>
> Here is a patch that implements this. Please analyse for possible
> breakage.
>
> cheers
>
> andrew
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

Attachment	Content-Type	Size
utf8.patch	text/x-patch	25.8 KB

In response to

Re: UTF8MatchText at 2007-05-20 14:11:16 from Andrew Dunstan

Responses

Re: UTF8MatchText at 2007-05-20 16:58:05 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2007-05-20 16:45:37	Re: Concurrent psql patch
Previous Message	Andrew Dunstan	2007-05-20 14:11:16	Re: UTF8MatchText

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Andrew Dunstan	2007-05-20 16:45:37	Re: Concurrent psql patch
Previous Message	Andrew Dunstan	2007-05-20 14:11:16	Re: UTF8MatchText