From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Brian Hirt <bhirt(at)mobygames(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Need help with full text index configuration |
Date: | 2010-07-29 10:57:30 |
Message-ID: | Pine.LNX.4.64.1007291454150.32129@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Brian,
you have two options:
1. Use your own parser (just modify default)
2. Use replace function, like
postgres=# select to_tsvector( replace('qw/er/ty','/',' '));
to_tsvector
----------------------
'er':2 'qw':1 'ty':3
(1 row)
Oleg
On Wed, 28 Jul 2010, Brian Hirt wrote:
> I have some data that can be searched, and it looks like the parser is making some assumptions about the data that aren't true in our case and I'm trying to figure out how to exclude a token type. I haven't been able to find the answer to my question so far, so I thought I would ask here.
>
> The data I have are english words, and sometimes there are words separated by a / without spaces. The parser finds these things and tokenizes them as files. I'm sure in some situations that's the right assumption, but based on my data, I know there will never be a file name in the column.
>
> For example instead of the parser recognizing three asciiword it recognizes one asciiword and one file. I'd like a way to have the / just get parsed as blank.
>
> db=# select * from ts_debug('english','maybe five/six');
> alias | description | token | dictionaries | dictionary | lexemes
> -----------+-------------------+----------+----------------+--------------+------------
> asciiword | Word, all ASCII | maybe | {english_stem} | english_stem | {mayb}
> blank | Space symbols | | {} | |
> file | File or path name | five/six | {simple} | simple | {five/six}
> (3 rows)
>
> I thought that maybe I could create a new configuration and drop the file mapping, but that doesn't seem to work either.
>
> db=# CREATE TEXT SEARCH CONFIGURATION public.testd ( COPY = pg_catalog.english );
> CREATE TEXT SEARCH CONFIGURATION
> db=# ALTER TEXT SEARCH CONFIGURATION testd DROP MAPPING FOR file;
> ALTER TEXT SEARCH CONFIGURATION
> db=# SELECT * FROM ts_debug('testd','mabye five/six');
> alias | description | token | dictionaries | dictionary | lexemes
> -----------+-------------------+----------+----------------+--------------+---------
> asciiword | Word, all ASCII | mabye | {english_stem} | english_stem | {maby}
> blank | Space symbols | | {} | |
> file | File or path name | five/six | {} | |
> (3 rows)
>
>
> Is there anyway to do this?
>
> Thanks for the help in advance. I'm running 8.4.4
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2010-07-29 11:03:32 | Re: [GENERAL] Incorrect FTS result with GIN index |
Previous Message | Dimitri Fontaine | 2010-07-29 09:42:14 | Re: PostgreSQL and distributed transactions |