From: | Andreas Joseph Krogh <andreak(at)officenet(dot)no> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Clarification of the "simple" dictionary |
Date: | 2010-07-22 17:56:24 |
Message-ID: | 4C488648.3000602@officenet.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 07/22/2010 07:44 PM, Oleg Bartunov wrote:
> Don't guess, but read docs
> http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY
>
>
> 12.6.2. Simple Dictionary
>
> The simple dictionary template operates by converting the input token
> to lower case and checking it against a file of stop words. If it is
> found in the file then an empty array is returned, causing the token
> to be discarded. If not, the lower-cased form of the word is returned
> as the normalized lexeme. Alternatively, the dictionary can be
> configured to report non-stop-words as unrecognized, allowing them to
> be passed on to the next dictionary in the list.
>
> d=# \dFd+ simple
> List of text search
> dictionaries
> Schema | Name | Template | Init options
> | Description
> ------------+--------+-------------------+--------------+-----------------------------------------------------------
>
> pg_catalog | simple | pg_catalog.simple | | simple
> dictionary: just lower case and check for stopword
>
> By default it has no Init options, so it doesn't check for stopwords.
Guess what - I *have* read the docs which sais "...and checking it
against a file of stop words". What was unclear to me was whether or not
it was configured with a stopwords-file or not as default, which is not
the case I understand from your reply. Very good, fits my needs like a
glove:-) It might be worth considering updating the docs to make this
clearer?
So - can we rely on "simple" to remain this way forever (no Init
options) or is it better to make a copy of it with the same properties
as today?
It seems "simple" + the unaccent dict. available in 9.0 saves my day,
thanks Mr. Bartunov.
--
Andreas Joseph Krogh<andreak(at)officenet(dot)no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Rosenholmveien 25 | know how to do a thing and to watch |
1414 Trollåsen | somebody else doing it wrong, without |
NORWAY | comment. |
| |
Tlf: +47 24 15 38 90 | |
Fax: +47 24 15 38 91 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+
From | Date | Subject | |
---|---|---|---|
Next Message | Peter C. Lai | 2010-07-22 18:18:18 | Re: varchar[] or text[] |
Previous Message | Oleg Bartunov | 2010-07-22 17:44:38 | Re: Clarification of the "simple" dictionary |