From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Andreas Joseph Krogh <andreak(at)officenet(dot)no> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Clarification of the "simple" dictionary |
Date: | 2010-07-22 17:44:38 |
Message-ID: | Pine.LNX.4.64.1007222140470.32129@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY
12.6.2. Simple Dictionary
The simple dictionary template operates by converting the input token to lower case and checking it against a file of stop words. If it is found in the file then an empty array is returned, causing the token to be discarded. If not, the lower-cased form of the word is returned as the normalized lexeme. Alternatively, the dictionary can be configured to report non-stop-words as unrecognized, allowing them to be passed on to the next dictionary in the list.
d=# \dFd+ simple
List of text search dictionaries
Schema | Name | Template | Init options | Description
------------+--------+-------------------+--------------+-----------------------------------------------------------
pg_catalog | simple | pg_catalog.simple | | simple dictionary: just lower case and check for stopword
By default it has no Init options, so it doesn't check for stopwords.
On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:
> On 07/22/2010 06:27 PM, John Gage wrote:
>> The easiest way to look at this is to give the simple dictionary a document
>> with to_tsvector() and see if stopwords pop out.
>>
>> In my experience they do. In my experience, the simple dictionary just
>> breaks the document down into the space etc. separated words in the
>> document. It doesn't analyze further.
>
> That's my experience too, I just want to make sure it doesn't actually have
> any stopwords which I've missed. Trying many phrases and checking for
> stopwords isn't really proving anything.
>
> Can anybody confirm the "simple" dict. only lowercases the words and
> "uniques" them?
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Joseph Krogh | 2010-07-22 17:56:24 | Re: Clarification of the "simple" dictionary |
Previous Message | Andreas Joseph Krogh | 2010-07-22 17:32:42 | Re: Clarification of the "simple" dictionary |