Quick Links

UTF8MatchText

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	"Andrew - Supernews" <andrew(at)supernews(dot)net>, pgsql-patches(at)postgresql(dot)org
Subject:	UTF8MatchText
Date:	2007-04-02 04:56:04
Message-ID:	20070402133445.DDF8.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

"Andrew - Supernews" <andrew(at)supernews(dot)net> wrote:

> ITAGAKI> I think all "safe ASCII-supersets" encodings are comparable
> ITAGAKI> by bytes, not only UTF-8.
>
> This is false, particularly for EUC.

Umm, I see. I updated the optimization to be used only for UTF8 case.
I also added some inlining hints that are useful on my machine (Pentium 4).

x1000 of LIKE '%foo% on 10000 rows tables [ms]
encoding | HEAD | P1 | P2 | P3
-----------+-------+-------+-------+-------
SQL_ASCII | 7094 | 7120 | 7063 | 7031
LATIN1 | 7083 | 7130 | 7057 | 7031
UTF8 | 17974 | 10859 | 10839 | 9682
EUC_JP | 17032 | 17557 | 17599 | 15240

- P1: UTF8MatchText()
- P2: P1 + __inline__ GenericMatchText()
- P3: P2 + __inline__ wchareq()
(The attached patch is P3.)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment	Content-Type	Size
utf8matchtext.patch	application/octet-stream	17.4 KB

In response to

Multibyte LIKE optimization at 2007-03-30 08:40:08 from ITAGAKI Takahiro

Responses

Re: UTF8MatchText at 2007-04-02 22:54:43 from Bruce Momjian
Re: UTF8MatchText at 2007-04-07 04:19:28 from Bruce Momjian
Re: UTF8MatchText at 2007-04-09 20:16:10 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-04-02 05:08:01	Re: Bug in UTF8-Validation Code?
Previous Message	Tatsuo Ishii	2007-04-02 04:49:58	Re: Bug in UTF8-Validation Code?

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2007-04-02 08:27:18	Re: Current enums patch
Previous Message	Tom Lane	2007-04-02 04:11:08	Re: Macros for typtype (was Re: Arrays of Complex Types)