From: | Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com> |
---|---|
To: | Arthur van der Wal <arthurvanderwal(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Question regarding custom parser |
Date: | 2010-10-05 07:26:54 |
Message-ID: | AANLkTikZFH6mwudStHrzuH2GDA4D8e3P8kPDRa-M-x0e@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
You can create an index on to_tsvector(replace(foo, '-', ' ')) and then
search using ...match..(replace(foo, ...), ...)
On Mon, Oct 4, 2010 at 11:41 AM, Arthur van der Wal <
arthurvanderwal(at)gmail(dot)com> wrote:
> Hi,
>
> I want to change the way PostgreSQL splits text into tokens, for example:
>
> plainto_tsquery("v-74") should split it up as "v" & "74" instead of "v" &
> "-74".
>
> Another example:
>
> select to_tsvector('NL83-V-74-001-001')'-001':5,6 '74':4 'nl83':2 'nl83-v':1 'v':3
>
> Searching for 'v-71' does not find the database entry as the '-' in 'v-71'
> is not indexed. It's hard to determine when PostgreSQL splits things up by
> '-' and when not
>
>
> I tried writing my own parser (based on the the test_parser example) which
> does nothing more than split at '-', however it seems to me that the logic
> for finding 'base' words and derivitives that postgres does so nicely
> doesn't work anymore.
>
> Another way would be to disable the (signed) int tokeniser and have the
> unsigned int tokeniser accept preceeding 0's.
>
> Can anybody point me in the right direction as in how to tackle this
> problem?
>
> Thanks very much in advance,
>
> Arthur van der Wal
>
From | Date | Subject | |
---|---|---|---|
Next Message | Massa, Harald Armin | 2010-10-05 07:33:44 | queriing the version of libpq |
Previous Message | Rajesh Kumar Mallah | 2010-10-05 07:04:52 | Re: streaming replication question |