Quick Links

Re: Question regarding custom parser

From:	Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com>
To:	Arthur van der Wal <arthurvanderwal(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Question regarding custom parser
Date:	2010-10-05 07:26:54
Message-ID:	AANLkTikZFH6mwudStHrzuH2GDA4D8e3P8kPDRa-M-x0e@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

You can create an index on to_tsvector(replace(foo, '-', ' ')) and then
search using ...match..(replace(foo, ...), ...)

On Mon, Oct 4, 2010 at 11:41 AM, Arthur van der Wal <
arthurvanderwal(at)gmail(dot)com> wrote:

> Hi,
>
> I want to change the way PostgreSQL splits text into tokens, for example:
>
> plainto_tsquery("v-74") should split it up as "v" & "74" instead of "v" &
> "-74".
>
> Another example:
>
> select to_tsvector('NL83-V-74-001-001')'-001':5,6 '74':4 'nl83':2 'nl83-v':1 'v':3
>
> Searching for 'v-71' does not find the database entry as the '-' in 'v-71'
> is not indexed. It's hard to determine when PostgreSQL splits things up by
> '-' and when not
>
>
> I tried writing my own parser (based on the the test_parser example) which
> does nothing more than split at '-', however it seems to me that the logic
> for finding 'base' words and derivitives that postgres does so nicely
> doesn't work anymore.
>
> Another way would be to disable the (signed) int tokeniser and have the
> unsigned int tokeniser accept preceeding 0's.
>
> Can anybody point me in the right direction as in how to tackle this
> problem?
>
> Thanks very much in advance,
>
> Arthur van der Wal
>

In response to

Question regarding custom parser at 2010-10-04 09:41:15 from Arthur van der Wal

Browse pgsql-general by date

	From	Date	Subject
Next Message	Massa, Harald Armin	2010-10-05 07:33:44	queriing the version of libpq
Previous Message	Rajesh Kumar Mallah	2010-10-05 07:04:52	Re: streaming replication question