From: | Sushant Sinha <sushant354(at)gmail(dot)com> |
---|---|
To: | Wojciech Knapik <webmaster(at)wolniartysci(dot)pl> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Very bad FTS performance with the Polish config |
Date: | 2009-11-19 04:29:12 |
Message-ID: | 9fb559330911182029p67e5d282r1941d929ceb66246@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
ts_headline calls ts_lexize equivalent to break the text. Off course there
is algorithm to process the tokens and generate the headline. I would be
really surprised if the algorithm to generate the headline is somehow
dependent on language (as it only processes the tokens). So Oleg is right
when he says ts_lexize is something to be checked.
I will try to replicate what you are trying to do but in the meantime can
you run the same ts_headline under psql multiple times and paste the result.
-Sushant.
2009/11/19 Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>
>
> Oleg Bartunov wrote:
>
> Yes, for 4-word texts the results are similar.
>>> Try that with a longer text and the difference becomes more and more
>>> significant. For the lorem ipsum text, 'polish' is about 4 times slower,
>>> than 'english'. For 5 repetitions of the text, it's 6 times, for 10
>>> repetitions - 7.5 times...
>>>
>>
>> Again, I see nothing unclear here, since dictionaries (as specified
>> in configuration) apply to ALL words in document. The more words in
>> document, the more overhead.
>>
>
> You're missing the point. I'm not surprised that the function takes more
> time for larger input texts - that's obvious. The thing is, the computation
> times rise more steeply when the Polish config is used. Steeply enough, that
> the difference between the Polish and English configs becomes enormous in
> practical cases.
>
> Now this may be expected behaviour, but since I don't know if it is, I
> posted to the mailing lists to find out. If you're saying this is ok and
> there's nothing to fix here, then there's nothing more to discuss and we may
> consider the thread closed.
> If not, ts_headline deserves a closer look.
>
> cheers,
> Wojciech Knapik
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Bailey | 2009-11-19 05:03:12 | Re: xpath_table equivalent |
Previous Message | Andrew Gierth | 2009-11-19 04:18:19 | Re: Timezones (in 8.5?) |