| From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> | 
|---|---|
| To: | Andreas Joseph Krogh <andreak(at)officenet(dot)no> | 
| Cc: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: Clarification of the "simple" dictionary | 
| Date: | 2010-07-22 17:44:38 | 
| Message-ID: | Pine.LNX.4.64.1007222140470.32129@sn.sai.msu.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY
12.6.2. Simple Dictionary
The simple dictionary template operates by converting the input token to lower case and checking it against a file of stop words. If it is found in the file then an empty array is returned, causing the token to be discarded. If not, the lower-cased form of the word is returned as the normalized lexeme. Alternatively, the dictionary can be configured to report non-stop-words as unrecognized, allowing them to be passed on to the next dictionary in the list.
d=# \dFd+ simple
                                           List of text search dictionaries
    Schema   |  Name  |     Template      | Init options |                        Description 
------------+--------+-------------------+--------------+-----------------------------------------------------------
  pg_catalog | simple | pg_catalog.simple |              | simple dictionary: just lower case and check for stopword
By default it has no Init options, so it doesn't check for stopwords.
On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:
> On 07/22/2010 06:27 PM, John Gage wrote:
>> The easiest way to look at this is to give the simple dictionary a document 
>> with to_tsvector() and see if stopwords pop out.
>> 
>> In my experience they do.  In my experience, the simple dictionary just 
>> breaks the document down into the space etc. separated words in the 
>> document.  It doesn't analyze further.
>
> That's my experience too, I just want to make sure it doesn't actually have 
> any stopwords which I've missed. Trying many phrases and checking for 
> stopwords isn't really proving anything.
>
> Can anybody confirm the "simple" dict. only lowercases the words and 
> "uniques" them?
>
>
 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andreas Joseph Krogh | 2010-07-22 17:56:24 | Re: Clarification of the "simple" dictionary | 
| Previous Message | Andreas Joseph Krogh | 2010-07-22 17:32:42 | Re: Clarification of the "simple" dictionary |