From: | Aurynn Shaw <ashaw(at)commandprompt(dot)com> |
---|---|
To: | Jonathan <jharahush(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Full Text Search 101? |
Date: | 2009-11-12 21:29:46 |
Message-ID: | 4AFC7E4A.7080305@commandprompt.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Jonathan;
>
> I posted to this forum once before and was able to receive help.
> Thanks again!
>
> I'm trying to implement full text search capabilities. Basically, I
> have a very simple "data catalog" type of website (http://
> gis.drcog.org/datacatalog) where the user can type in a word or words
> to search for records that match the criteria. I have a table with a
> few fields, but I want to be able to search/index two fields -- the
> "name" of the dataset and the "description." (or more, if I can index
> fields from other tables too, that would be great). I'd like to be
> able to use a full text search to rank the results in terms of
> relevance.
You can set up an index per table, but I've found that having a single
vector table with bridge tables to the data I want to search works well.
This would be in the style of:
create table vectors (
id serial primary key,
vector tsvector not null
);
create table interesting_data (
id serial primary key,
textual text not null
);
CREATE TABLE vector_to_interesting (
v_id int not null references vectors(id),
i_id int not null references interesting_data(id)
);
And then
SELECT id.*
FROM interesting_data itd,
vectors v,
vector_to_interesting itv
WHERE v.id = itv.v_id
AND itv.i_id = itd.id
AND v.vector @@ to_tsquery('searchpattern');
Allowing for some procedures around that that return multiple row types,
to the client software.
>
> What is the best way to handle this? I've read through the PostgreSQL
> documentation and don't quite understand it although I'm trying to
> understand and am 'playing around' with this on a development server.
> Is the default text search configuration enough? I've tried setting my
> own configuration but get errors about not finding .dict dictionary
> files? I have a default install of PostgreSQL 8.4.
>
> Also, I've created a tsvector column and created indexes, but it
> didn't seem to delete stop words from the indexes. Should the stop
> words be indexed?
The default "english" configuration in PostgreSQL should have done this.
Building your own configuration tends towards being a bit more advanced,
and "english" should suit most needs.
>
> Also, I don't quite understand how to create indexes but rank certain
> words as being more important than others, for instance, maybe having
> words that come from the "name" column carrying more importance than
> words coming from the "description" column.
For part of this, there's ts_rank(tsvector, tsquery), as well as the
relative weighting system in the textsearch modules. Unfortunately, I
don't have much experience with the relative weighting, but, ts_rank()
should get you partly there.
I also found an interesting page on relative weighting @
http://www.postgresonline.com/special_feature.php?sf_name=postgresql83tsearch_cheatsheet&outputformat=html
that might be useful for you.
> Finally, I'd like
> "watersheds" to come up when someone searches for "water" so I don't
> think I have this configured properly, because this record is not
> returned.
Well, there's two ways to achieve this.
The first is to manually the stopwords, which is getting into the custom
configuration and a bit on the more advanced side.
The other is that in PG8.4, you can do
to_tsquery('water:*')
which will tell the search parser to do a partial match, which would
return "watershed", in this instance.
My testing has shown it to be a bit slower (30ms, vs 15ms for a
non-partial search), but not egregiously slow.
>
> Is there a good tutorial or maybe documentation that is a bit easier
> to understand? I know my database is far from complicated but I can't
> seem to find a configuration that works well. When I try ranking my
> results, most of the results end up with the same rank, so something
> must be wrong?
>
> Can someone point me in the right direction?
>
> Thanks for the help. I appreciate it.
>
Hope this helps. :)
Regards,
--
Aurynn Shaw
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
ashaw(at)commandprompt(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Hans-Juergen Schoenig | 2009-11-12 22:26:17 | Re: [GENERAL] pgday.eu |
Previous Message | Michael Nacos | 2009-11-12 21:02:39 | Re: Full Text Search 101? |