Re: FTS performance with the Polish config

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kenneth Marshall <ktm(at)rice(dot)edu>, Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>, pgsql-performance(at)postgresql(dot)org
Subject: Re: FTS performance with the Polish config
Date: 2009-11-15 14:06:42
Message-ID: Pine.LNX.4.64.0911151702010.6801@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, 15 Nov 2009, Pavel Stehule wrote:

>
> czech stemmer doesn't exist :(
>

I'd try morfessor http://www.cis.hut.fi/projects/morpho/, which is
unsupervised morphological dictionary. I think it'd be not very hard to add
morfessor dictionary template to tsearch2, so people could create their
own stemmers.

>>
>> Ispell dictionary (doesn't matter english, or other language) is slow for
>> the first load and then it caches, so there is no problem if use persistent
>> database connection, which is de facto standard for any serious projects.
>>
>
> I agree so connection pooling should be a solution. But it is good?
> Cannot we share dictionary better?

We thought about this issue and got some idea. Teodor can be more clear here,
since I don't remember all details.

>
>>>
>>> Pavel
>>>
>>>> Oleg
>>>> On Sat, 14 Nov 2009, Pavel Stehule wrote:
>>>>
>>>>> 2009/11/14 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>>>>>
>>>>>> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
>>>>>>>
>>>>>>> On Sat, Nov 14, 2009 at 12:25:05PM +0100, Wojciech Knapik wrote:
>>>>>>>>
>>>>>>>> I just finished implementing a "search engine" for my site and found
>>>>>>>> ts_headline extremely slow when used with a Polish tsearch
>>>>>>>> configuratio=
>>>>>
>>>>> n,
>>>>>>>>
>>>>>>>> while fast with English.
>>>>>>
>>>>>>> The documentation for ts_headline() states:
>>>>>>> ts_headline uses the original document, not a tsvector summary, so it
>>>>>>> can be slow and should be used with care.
>>>>>>
>>>>>> That's true but the argument in the docs would apply just as well to
>>>>>> english or any other config. =C2=A0So while Wojciech would be well
>>>>>> advised
>>>>>> to try to avoid making a lot of calls to ts_headline, it's still
>>>>>> curious
>>>>>> that it's so much slower in polish than english. =C2=A0Could we see a
>>>>>> self-contained test case?
>>>>>
>>>>> is it dictionary based or stem based?
>>>>>
>>>>> Dictionary based FTS is very slow (first load). Minimally czech FTS is
>>>>> slow.
>>>>>
>>>>> regards
>>>>> Pavel Stehule
>>>>>
>>>>>>
>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
>>>>>> =C2=
>>>>>
>>>>> =A0 =C2=A0regards, tom lane
>>>>>>
>>>>>> --
>>>>>> Sent via pgsql-performance mailing list
>>>>>> (pgsql-performance(at)postgresql(dot)org)
>>>>>> To make changes to your subscription:
>>>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>>>>
>>>>>
>>>>> --=20
>>>>> Sent via pgsql-performance mailing list
>>>>> (pgsql-performance(at)postgresql(dot)org)
>>>>> To make changes to your subscription:
>>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>>>
>>>>
>>>>        Regards,
>>>>                Oleg
>>>> _____________________________________________________________
>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>>
>>>
>>
>>        Regards,
>>                Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2009-11-15 19:53:22 Re: SSD + RAID
Previous Message Craig Ringer 2009-11-15 10:17:24 Re: SSD + RAID