From: | Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
---|---|
To: | Sagiv Some <sagivsome(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: phraseto_tsquery design |
Date: | 2018-06-21 16:44:00 |
Message-ID: | 20180621164358.GA19523@zakirov.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
On Thu, Jun 21, 2018 at 11:02:32AM -0400, Sagiv Some wrote:
> 2. However, it seems impossible to bypass the performance problem of phrase
> searching. I conduct quite a bit of phrase searching, and although
> postgres' "phraseto_tsquery" performs great on phrases with uncommon words,
> it slows to a screeching halt on phrases with common words such as "law
> firm" or, for example, "bank of america". This is a huge problem, because
> "plainto_tsquery" performs just fine on these but as I understand it,
> phrase searching is built to do a scan after finding each word using
> "plainto"?
>
> There are already positions and the "plainto" function is quite fast; is
> there a way to modify the "phraseto" query to perform a useful and fast
> search that looks for the distance between found words appropriately?
If I understood you correctly you use GIN index for text search.
Unfortunately it isn't phraseto_tsquery() function issue. It is GIN
index characteristic.
tsvector consists of lexemes and their positions retreived from text
document. GIN has only lexems, and it is OK for regular search (using
plainto_tsquery() function). But phrase search (via phraseto_tsquery())
requires lexem positions. During phrase search additional work is made:
- first get all items from index which satisfy the query (as in reqular
search)
- then read entire tsvector from the heap
- recheck all got items and exclude those of them which don't satisfy
the phrase query
Last two point is additional work.
We have our index as an extension. It is changed GIN index and can store
lexemes and their positions. And therefore phrase queries are faster.
--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2018-06-21 16:56:29 | Wrong cost estimation for foreign tables join with use_remote_estimate disabled |
Previous Message | Tom Lane | 2018-06-21 16:39:17 | Re: Fast default stuff versus pg_upgrade |