Quick Links

tsearch2 and hyphenated terms

From:	Reece Hart <reece(at)harts(dot)net>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	tsearch2 and hyphenated terms
Date:	2008-04-11 05:17:25
Message-ID:	1207891045.6903.14.camel@snafu
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I'd like to use tsearch2 to index protein and gene names. Unfortunately,
such names are written inconsistently and sometimes with hyphens. For
example, MCL-1 and MCL1 are semantically equivalent but with the default
parser and to_tsvector, I see this:

unison(at)u8(dot)3=> select to_tsvector('MCL1 MCL-1');
to_tsvector
-------------------------
'-1':3 'mcl':2 'mcl1':1

For the purposes of indexing these names, I suspect I'd get the majority
of cases by removing a hyphen when it's followed by 1 or 2 chars from
[a-zA-Z0-9]. Does that require a custom parser?

Thanks,
Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Responses

Re: tsearch2 and hyphenated terms at 2008-04-11 16:45:32 from Tom Lane
Re: tsearch2 and hyphenated terms at 2008-04-11 18:07:14 from Oleg Bartunov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Pavan Deolasee	2008-04-11 06:18:24	Re: begin transaction locks out other connections
Previous Message	A. Kretschmer	2008-04-11 05:02:44	Re: Date / interval question