From: | Zev Benjamin <zev-pgsql(at)strangersgate(dot)com> |
---|---|
To: | bricklen <bricklen(at)gmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Full text search on partial URLs |
Date: | 2013-11-06 19:07:40 |
Message-ID: | 527A937C.9000703@strangersgate.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 11/06/2013 02:04 PM, bricklen wrote:
>
> On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin
> <zev-pgsql(at)strangersgate(dot)com <mailto:zev-pgsql(at)strangersgate(dot)com>> wrote:
>
> On 11/06/2013 01:47 PM, bricklen wrote:
>
>
> On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
> <zev-pgsql(at)strangersgate(dot)com
> <mailto:zev-pgsql(at)strangersgate(dot)com>
> <mailto:zev-pgsql(at)__strangersgate(dot)com
> <mailto:zev-pgsql(at)strangersgate(dot)com>>> wrote:
>
> Hi,
>
> I have Postgres full text search set up for my application
> and it's
> been working great! However, my users would like their
> searches to
> turn up parts of URLs. For example, they would like a
> search for
> "foobar" to turn up a document that contains the string
> "http://example.com/foobar/____blah
> <http://example.com/foobar/__blah>
> <http://example.com/foobar/__blah <http://example.com/foobar/blah>>"
> (and similarly for queries like "example" and "blah). With the
> default dictionaries for host, url, and url_path, the
> search query
> would have to contain the complete host or url path.
>
> What is the best way to accomplish this? Should I be
> looking at
> building a custom dictionary that breaks down hosts and
> urls or is
> there something simpler I can do?
>
>
> Have you looked into trigrams?
> http://www.postgresql.org/__docs/current/static/pgtrgm.__html
> <http://www.postgresql.org/docs/current/static/pgtrgm.html>
>
>
> I've looked at it in the context of adding fuzzy search. But my
> understanding is that doing a fuzzy search here would only work if
> the query were a significant fraction of, say, the url path. For
> example, I would expect a fuzzy search of "foobar" on "/foobar/x" to
> return a high similarity, but a fuzzy search of "foobar" on
> "/foobar/some/very/long/path/__x" to have a low similarity.
>
> Or are you suggesting using trigrams in a different way?
>
>
> Yeah, I was thinking more along the lines of allowing wildcard
> searching, not similarity.
>
> Eg.
> CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST (
> yourcol gist_trgm_ops );
> select * from yourtable where yourcol ~~ '%foobar%';
>
Hrm. That might work. So the application-level search functionality
would be the union of tsearch and trigram wildcard matching.
If anyone else has other ideas, I'd be interested in hearing them as well.
Thanks,
Zev
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Ross | 2013-11-06 19:26:08 | Re: After upgrade to 9.3, streaming replication fails to start |
Previous Message | bricklen | 2013-11-06 19:04:07 | Re: Full text search on partial URLs |