Re: improve Chinese locale performance

From: Quan Zongliang <quanzongliang(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improve Chinese locale performance
Date: 2013-07-22 09:15:39
Message-ID: 51ECF83B.1010303@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/22/2013 03:54 PM, Craig Ringer wrote:
> On 07/22/2013 12:17 PM, Quan Zongliang wrote:
>> Hi hackers,
>>
>> I tried to improve performance when database is Chinese.
>>
>> Under openSUSE, create index on table with 54996 rows
>> locale=C, 140ms
>> locale=zh_CN, 985ms
>>
>> I think the function strcoll() of Linux is too slow.
>> So, I made a new utf8 to GB18030 map, store Chinese order in it.
>> Do not call strcoll().
>> On my modified code, same operation, locale=zh_CN, 203ms.
>
> It might be worth looking at gcc's strcoll() implementation. See if it
> performs better when you use the latest gcc, and if not try to improve
> gcc's strcoll() .
>
> I'd be interested in seeing a test case for this that shows that the
> results of your new collation are exactly the same as the original
> strcoll() based approach.
>
Do not same exactly.
I found some errors in gcc's strcoll() when order by Chinese character.
Because there are lots of special characters in Chinese.
gcc's strcoll() do not consider this or missed at part of them.

Yes, the best way is to impove gcc's strcoll().
But I don't know how to do.

Thanks,
Quan Zongliang

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2013-07-22 10:14:10 Re: Improvement of checkpoint IO scheduler for stable transaction responses
Previous Message KONDO Mitsumasa 2013-07-22 08:52:31 Re: Improvement of checkpoint IO scheduler for stable transaction responses