Re: [GENERAL] FTS query, statistics and planner estimations…

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Pierre Ducroquet <pierre(dot)ducroquet(at)people-doc(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [GENERAL] FTS query, statistics and planner estimations…
Date: 2016-11-09 10:41:47
Message-ID: CAFj8pRB+dws_D4ocfrTU==GWCxd8TGr_SEhym+4uK3GoZhuNAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2016-11-09 11:19 GMT+01:00 Pierre Ducroquet <pierre(dot)ducroquet(at)people-doc(dot)com
>:

> On Wednesday, November 9, 2016 10:40:10 AM CET Francisco Olarte wrote:
> > Pierre:
> >
> > On Wed, Nov 9, 2016 at 10:22 AM, Pierre Ducroquet
> >
> > <pierre(dot)ducroquet(at)people-doc(dot)com> wrote:
> > > The query does a few joins «after» running a FTS query on a main table.
> > > The FTS query returns a few thousand rows, but the estimations are
> wrong,
> > > leading the optimizer to terrible plans compared to what should happen,
> > > and
> > > thus creates a far higher execution time.
> >
> > ....
> >
> > > but the issue remain the same. The table contains about 295,000
> documents,
> > > and
> > ....
> >
> > > Request | Estimated rows | Real rows
> > >
> > > ----------------------------------+----------------+-----------
> > > 'word1' | 38050 | 37500
> > > 'word1 word2' | 4680 | 32000
> > > 'word1 word2 word3' | 270 | 12300
> > > 'word1 word2 word3 word4' | 10 | 9930
> > > 'word1 word2 word3 word4 word5' | 1 | 9930
> > >
> > > You can see that with more words in query, the estimation falls far
> behind
> > > reality.
> >
> > I'm not really familiar with FTS but, doing a few division of
> > estimations and rows it seems it estimates as uncorrelated words, and
> > you real rows clearly indicate some of them are clearly correlated (
> > like w1/w2 and w4/s5, and partially w3/w45 ) and very common.
> >
> > > Is that a known limitation of the FTS indexing ? Am I missing something
> > > obvious, or a poor configuration ?
> >
> > Someone more familiar with it needed for that, but what I've found
> > several times is FTS does not mix too well with relational queries at
> > the optimizer level ( as FTS terms can have very diverse degrees of
> > correlation, which is very difficult to store in the statistics a
> > relational optimizer normally uses ).
>
> Indeed the words in the query are correlated, but I do hope that the FTS
> indexing is able to cope with that. Otherwise it makes it far less usable
> than
> what one would expect since real world queries will often contain
> sentences or
> related words. Also, PostgreSQL 9.6 introduced phrase search in FTS, and I
> don't see how that would work without a working multi-words query.
>

The PostgreSQL statistics are not multidimensional - so bad estimation is
expected :(

Regards

Pavel

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2016-11-09 10:44:54 Re: [GENERAL] FTS query, statistics and planner estimations…
Previous Message Pierre Ducroquet 2016-11-09 10:19:47 Re: FTS query, statistics and planner estimations…