From: | bricklen <bricklen(at)gmail(dot)com> |
---|---|
To: | Zev Benjamin <zev-pgsql(at)strangersgate(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Full text search on partial URLs |
Date: | 2013-11-06 19:04:07 |
Message-ID: | CAGrpgQ_QoB4fYMPSZEb_N77ai6DtPknLmbLqMtc04=o_j0-qMA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin
<zev-pgsql(at)strangersgate(dot)com>wrote:
> On 11/06/2013 01:47 PM, bricklen wrote:
>
>>
>> On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
>> <zev-pgsql(at)strangersgate(dot)com <mailto:zev-pgsql(at)strangersgate(dot)com>> wrote:
>>
>> Hi,
>>
>> I have Postgres full text search set up for my application and it's
>> been working great! However, my users would like their searches to
>> turn up parts of URLs. For example, they would like a search for
>> "foobar" to turn up a document that contains the string
>> "http://example.com/foobar/__blah <http://example.com/foobar/blah>"
>> (and similarly for queries like "example" and "blah). With the
>> default dictionaries for host, url, and url_path, the search query
>> would have to contain the complete host or url path.
>>
>> What is the best way to accomplish this? Should I be looking at
>> building a custom dictionary that breaks down hosts and urls or is
>> there something simpler I can do?
>>
>>
>> Have you looked into trigrams?
>> http://www.postgresql.org/docs/current/static/pgtrgm.html
>>
>
> I've looked at it in the context of adding fuzzy search. But my
> understanding is that doing a fuzzy search here would only work if the
> query were a significant fraction of, say, the url path. For example, I
> would expect a fuzzy search of "foobar" on "/foobar/x" to return a high
> similarity, but a fuzzy search of "foobar" on "/foobar/some/very/long/path/x"
> to have a low similarity.
>
> Or are you suggesting using trigrams in a different way?
>
Yeah, I was thinking more along the lines of allowing wildcard searching,
not similarity.
Eg.
CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST ( yourcol
gist_trgm_ops );
select * from yourtable where yourcol ~~ '%foobar%';
From | Date | Subject | |
---|---|---|---|
Next Message | Zev Benjamin | 2013-11-06 19:07:40 | Re: Full text search on partial URLs |
Previous Message | Zev Benjamin | 2013-11-06 18:53:03 | Re: Full text search on partial URLs |