From: | Sagiv Some <sagivsome(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | phraseto_tsquery design |
Date: | 2018-06-21 15:02:32 |
Message-ID: | CADcPEWBedG8jUG6yuSS4d8n7SsUM2fajH01Eprp27do1sAQEBA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I recently migrated over from SQL Server to Postgres (through Amazon
Aurora) for the cost.
I have been working with the full-text search functions for quite some
time, but have encountered a few major roadblocks in my quest to use
postgres full-text search functionality to its fullest. I am using it as a
blob text-search tool, where the entire document (could be up to 100 pages)
is indexed and searched on:
1. Limited table and document sizes. In SQL server, I can create a 1.5 TB
table no problem (filingid, data blobs). However, in postgres, this is not
possible - I am limited to approximately 100-200GB per table and 1MB per
tsvector. This is an annoyance, but can get around it.
2. However, it seems impossible to bypass the performance problem of phrase
searching. I conduct quite a bit of phrase searching, and although
postgres' "phraseto_tsquery" performs great on phrases with uncommon words,
it slows to a screeching halt on phrases with common words such as "law
firm" or, for example, "bank of america". This is a huge problem, because
"plainto_tsquery" performs just fine on these but as I understand it,
phrase searching is built to do a scan after finding each word using
"plainto"?
There are already positions and the "plainto" function is quite fast; is
there a way to modify the "phraseto" query to perform a useful and fast
search that looks for the distance between found words appropriately?
Seems fairly trivial to modify the function this way, but I didn't see it
on the development roadmap. It's a pretty critical piece and I have seen
that other people have had issues with it too, and a test between this
function and some other similar functions show that the performance of the
phraseto function is quite poor.
Thanks,
Sagiv
sagivsome(at)gmail(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2018-06-21 15:09:27 | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |
Previous Message | Robbie Harwood | 2018-06-21 14:56:47 | Re: libpq compression |