From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: string_to_array eats too much memory? |
Date: | 2006-11-08 16:34:39 |
Message-ID: | Pine.GSO.4.63.0611081928240.8413@ra.sai.msu.su |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 9 Nov 2006, Tatsuo Ishii wrote:
>>> Porblem with Japanese is, it's an agglutinative language and we need
>>> to separate each word from a sentence. So, I need to modify tsearch2
>>> anyway (I know someone from Japan is working on this).
>> https://www.oss.ecl.ntt.co.jp/tsearch2j/index.html
>> That's it?
>
> Yes. However I'm going to use different "word separation" library from
> them and will make some tweaks.
>
>>> BTW, can tsearch2 handle ~70k words in a document?
>>
>> I don't see any problem.
>
> Great. I have made a little trial and it seems tsearch2 works great
> with GIN.
Tatsuo, ideallly, I'd like to have tsearch2 untouched, but with
japanese parser(s) and dictionaries (program) available. This is how
tsearch2 was designed. If something prevent to do so, we should improve
tsearch2. This is important now, since we're going to build tsearch2 into
PostgreSQL core for 8.3.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Brendan Jurd | 2006-11-08 20:08:33 | Error in from_char() for field 'D'? |
Previous Message | Teodor Sigaev | 2006-11-08 15:56:43 | Re: string_to_array eats too much memory? |