Re: Clarification of the "simple" dictionary

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Clarification of the "simple" dictionary
Date: 2010-07-22 17:44:38
Message-ID: Pine.LNX.4.64.1007222140470.32129@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token to lower case and checking it against a file of stop words. If it is found in the file then an empty array is returned, causing the token to be discarded. If not, the lower-cased form of the word is returned as the normalized lexeme. Alternatively, the dictionary can be configured to report non-stop-words as unrecognized, allowing them to be passed on to the next dictionary in the list.

d=# \dFd+ simple
List of text search dictionaries
Schema | Name | Template | Init options | Description
------------+--------+-------------------+--------------+-----------------------------------------------------------
pg_catalog | simple | pg_catalog.simple | | simple dictionary: just lower case and check for stopword

By default it has no Init options, so it doesn't check for stopwords.

On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:

> On 07/22/2010 06:27 PM, John Gage wrote:
>> The easiest way to look at this is to give the simple dictionary a document
>> with to_tsvector() and see if stopwords pop out.
>>
>> In my experience they do. In my experience, the simple dictionary just
>> breaks the document down into the space etc. separated words in the
>> document. It doesn't analyze further.
>
> That's my experience too, I just want to make sure it doesn't actually have
> any stopwords which I've missed. Trying many phrases and checking for
> stopwords isn't really proving anything.
>
> Can anybody confirm the "simple" dict. only lowercases the words and
> "uniques" them?
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Joseph Krogh 2010-07-22 17:56:24 Re: Clarification of the "simple" dictionary
Previous Message Andreas Joseph Krogh 2010-07-22 17:32:42 Re: Clarification of the "simple" dictionary