Re: Clarification of the "simple" dictionary

From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Clarification of the "simple" dictionary
Date: 2010-07-22 17:56:24
Message-ID: 4C488648.3000602@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/22/2010 07:44 PM, Oleg Bartunov wrote:
> Don't guess, but read docs
> http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY
>
>
> 12.6.2. Simple Dictionary
>
> The simple dictionary template operates by converting the input token
> to lower case and checking it against a file of stop words. If it is
> found in the file then an empty array is returned, causing the token
> to be discarded. If not, the lower-cased form of the word is returned
> as the normalized lexeme. Alternatively, the dictionary can be
> configured to report non-stop-words as unrecognized, allowing them to
> be passed on to the next dictionary in the list.
>
> d=# \dFd+ simple
> List of text search
> dictionaries
> Schema | Name | Template | Init options
> | Description
> ------------+--------+-------------------+--------------+-----------------------------------------------------------
>
> pg_catalog | simple | pg_catalog.simple | | simple
> dictionary: just lower case and check for stopword
>
> By default it has no Init options, so it doesn't check for stopwords.

Guess what - I *have* read the docs which sais "...and checking it
against a file of stop words". What was unclear to me was whether or not
it was configured with a stopwords-file or not as default, which is not
the case I understand from your reply. Very good, fits my needs like a
glove:-) It might be worth considering updating the docs to make this
clearer?

So - can we rely on "simple" to remain this way forever (no Init
options) or is it better to make a copy of it with the same properties
as today?

It seems "simple" + the unaccent dict. available in 9.0 saves my day,
thanks Mr. Bartunov.

--
Andreas Joseph Krogh<andreak(at)officenet(dot)no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Rosenholmveien 25 | know how to do a thing and to watch |
1414 Trollåsen | somebody else doing it wrong, without |
NORWAY | comment. |
| |
Tlf: +47 24 15 38 90 | |
Fax: +47 24 15 38 91 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter C. Lai 2010-07-22 18:18:18 Re: varchar[] or text[]
Previous Message Oleg Bartunov 2010-07-22 17:44:38 Re: Clarification of the "simple" dictionary