Question regarding custom parser

From: Arthur van der Wal <arthurvanderwal(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Question regarding custom parser
Date: 2010-10-04 09:39:00
Message-ID: AANLkTi=4h_Qw5HXo4dhk_Dv3vwivR_2wjFm=_oFSggjX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I want to change the way PostgreSQL splits text into tokens, for example:

plainto_tsquery("v-74") should split it up as "v" & "74" instead of "v" &
"-74".

Another example:

select to_tsvector('NL83-V-74-001-001')'-001':5,6 '74':4 'nl83':2
'nl83-v':1 'v':3

Searching for 'v-71' does not find the database entry as the '-' in 'v-71'
is not indexed. It's hard to determine when PostgreSQL splits things up by
'-' and when not

I tried writing my own parser (based on the the test_parser example) which
does nothing more than split at '-', however it seems to me that the logic
for finding 'base' words and derivitives that postgres does so nicely
doesn't work anymore.

Another way would be to disable the (signed) int tokeniser and have the
unsigned int tokeniser accept preceeding 0's.

Can anybody point me in the right direction as in how to tackle this
problem?

Thanks very much in advance,

Arthur van der Wal

Browse pgsql-general by date

  From Date Subject
Next Message Arthur van der Wal 2010-10-04 09:41:15 Question regarding custom parser
Previous Message Robert Gravsjö 2010-10-04 09:34:20 Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)