Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres

From: "Bayer, Samuel" <sam(at)mitre(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
Date: 2022-03-04 16:39:39
Message-ID: 7ee2afc2-dcf7-2bc9-3092-8ca58ed2b880@mitre.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've tried both ranking functions. I've tried a variety of the normalization settings. I'm using the standard English language configuration. Postgres 13.

I do understand your FTS philosophy - I suppose I'm looking for guidance about how best to approximate the search capability in Solr using the FTS pieces you have. One concrete question, I suppose, is: the classic TF/IDF search strategy relies on inverse document frequency, which looks across the corpus. I can't tell whether that corpus-wide frequency information is taken into account in either ranking function.

I don't know if Solr weights earlier tokens more heavily, but I wouldn't be surprised if it does.

On 3/4/22 11:09 AM, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> On Fri, Mar 4, 2022 at 10:41:16AM -0500, Bayer, Samuel wrote:
>>> I apologize for not being able to be more specific.
>
>> I know it is hard to quantify. Is it possible that Postgres is treating
>> all the terms equally, while Solr is prioritizing terms that are earlier
>> in the document?
>
> A few basic questions:
>
> * which ranking function are you using?
>
> * with what options?
>
> * which PG version exactly?
>
> As far as I can see from a quick look at the docs, neither
> ts_rank() nor ts_rank_cd() consider "earlier in the document"
> to be an interesting consideration. They do have the ability
> to prefer terms that have been marked as having a higher weight,
> but you'd need to do some setup work to make that useful ---
> basically, you'd have to separate out the title or other metadata
> and apply setweight() to it while building the tsvectors.
>
> I wouldn't be surprised if Solr has some well-tuned default
> heuristics that mean that you don't have to work hard to get
> good results from it. The current state of our FTS features
> is more like "here's all the parts, but you have to build the
> behavior you want".
>
> ISTM that our FTS features have basically been on autopilot
> since they went in. I'd sort of hoped that we'd see more
> parsers, more ranking functions, etc, over time ... but nothing
> like that has happened. I'm not sure if that's just lack of
> interest, or if people find the code too difficult to work with.
>
> regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-03-04 16:43:57 Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
Previous Message Tom Lane 2022-03-04 16:09:46 Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres