| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Andrew Dunstan <andrew(at)dunslane(dot)net> | 
| Cc: | Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org | 
| Subject: | Re: UTF8MatchText | 
| Date: | 2007-05-20 16:58:05 | 
| Message-ID: | 2132.1179680285@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers pgsql-patches | 
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Are you sure? The big remaining char-matching bottleneck will surely 
> be in the code that scans for a place to start matching a %. But 
> that's exactly where we can't use byte matching for cases where the 
> charset might include AB and BA as characters - the pattern might 
> contain %BA and the string AB. However, this isn't a danger for UTF8, 
> which leads me to think that we do indeed need a special case for 
> UTF8, but for a different improvement from that proposed in the 
> original patch. I'll post an updated patch shortly.
> Here is a patch that implements this. Please analyse for possible 
> breakage.
On the strength of this analysis, shouldn't we drop the separate
UTF8 match function and just use SB_MatchText for UTF8?
It strikes me that we may be overcomplicating matters in another way
too.  If you believe that the %-scan code is now the bottleneck, that
is, the key loop is where we have pattern '%foo' and we are trying to
match 'f' to each successive data position, then you should be bothered
that SB_MatchTextIC is applying tolower() to 'f' again for each data
character.  Worst-case we could have O(N^2) applications of tolower()
during a match.  I think there's a fair case to be made that we should
get rid of SB_MatchTextIC and implement *all* the case-insensitive
variants by means of an initial lower() call.  This would leave us with
just two match functions and allow considerable unification of the setup
logic.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2007-05-20 17:07:39 | Re: Passing more context info to selectivity-estimation code | 
| Previous Message | Andrew Dunstan | 2007-05-20 16:45:37 | Re: Concurrent psql patch | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nikolay Samokhvalov | 2007-05-20 18:36:45 | Re: [PATCHES] build/install xml2 when configured with libxml | 
| Previous Message | Andrew Dunstan | 2007-05-20 16:45:37 | Re: Concurrent psql patch |