Re: FTS performance issue - planner problem identified (but only partially resolved)

From: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
To: Stefan Keller <sfkeller(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: FTS performance issue - planner problem identified (but only partially resolved)
Date: 2013-07-19 19:57:43
Message-ID: B6F6FD62F2624C4C9916AC0175D56D880CE0DD15@jenmbs01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
> plainto_tsquery('english', 'good');
>
> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
> The planner obviously always chooses table scan

Hello,

A probable reason for the time difference is the cost for decompressing toasted content.
At lest in 8.3, the planner was not good at estimating it.

I'm getting better overall performances since I've stopped collect statistic on tsvectors.
An alternative would have been to disallow compression on them.

I'm aware this is a drastic way and would not recommend it without testing. The benefit may depend on the type of data you are indexing.
In our use case these are error logs with many java stack traces, hence with many lexemes poorly discriminative.

see: http://www.postgresql.org/message-id/27953.1329434125@sss.pgh.pa.us
as a comment on
http://www.postgresql.org/message-id/C4DAC901169B624F933534A26ED7DF310861B363@JENMAIL01.ad.intershop.net

regards,

Marc Mamin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stefan Keller 2013-07-19 23:55:29 Re: FTS performance issue - planner problem identified (but only partially resolved)
Previous Message Pavel Stehule 2013-07-19 16:16:30 Re: Fwd: Relatively high planner overhead on partitions?