From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | cheighlund(at)yahoo(dot)com |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Weird problem concerning tsearch functions built into postgres 8.3, assistance requested |
Date: | 2008-10-30 13:37:40 |
Message-ID: | 4909B8A4.4050706@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> One of the tables we're using in the 8.1.3 setups currently running
> includes phone numbers as a searchable field (fti_phone), with the
> results of a select on the field generally looking like this: 'MMM':2
> 'NNNN':3 'MMM-NNNN':1. MMM is the first three digits, NNNN is the
> fourth-seventh.
>
> The weird part is this: On the old systems running 8.1.3, I can look up
> a record by
> fti_phone using any of the three above items; first three, last four, or
> entire number including dash. On the new system running 8.3.1, I can do
> a lookup by the first three or the last four and get the results I'm
> after, but any attempt to do a lookup by the entire MMM-NNNN version
> returns no records.
Parser was changed:
postgres=# select * from ts_debug('123-4567');
alias | description | token | dictionaries | dictionary | lexemes
-------+------------------+-------+--------------+------------+---------
uint | Unsigned integer | 123 | {simple} | simple | {123}
int | Signed integer | -4567 | {simple} | simple | {-4567}
(2 rows)
postgres=# select * from ts_debug('abc-defj');
alias | description | token | dictionaries
| dictionary | lexemes
-----------------+---------------------------------+----------+----------------+--------------+------------
asciihword | Hyphenated word, all ASCII | abc-defj | {english_stem}
| english_stem | {abc-defj}
hword_asciipart | Hyphenated word part, all ASCII | abc | {english_stem}
| english_stem | {abc}
blank | Space symbols | - | {}
| |
hword_asciipart | Hyphenated word part, all ASCII | defj | {english_stem}
| english_stem | {defj}
Parser in 8.1 threats any [alnum]+-[alnum]+ as a hyphenated word, but 8.3 treats
[digit]+-[digit]+ as two separated numbers.
So, you can play around pre-process texts before indexing or have a look on
regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html)
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Guettler | 2008-10-30 13:37:43 | Re: Schema Upgrade Howto |
Previous Message | Igor Neyman | 2008-10-30 13:17:00 | excluding tables from VACUUM ANALYZE |