Quick Links

Re: using Tsearch2 for chemical text

From:	Naz Gassiep <naz(at)mira(dot)net>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Cc:	Rajarshi Guha <rguha(at)indiana(dot)edu>
Subject:	Re: using Tsearch2 for chemical text
Date:	2007-07-26 05:53:05
Message-ID:	46A836C1.8080905@mira.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

> I think you might need to write a custom lexer to divide the strings
> into meaningful units. If there are subsections of these names that
> make sense to search for, then tsearch2 can certainly handle the
> mechanics of that, but I doubt that the standard rules will divide
> these names into lexemes usefully.

A custom lexer for tsearch2 that recognized chemistry related lexical
components (di-, tetra-, acetyl-, ethan-, -oic, -ane, -ene etc) would
increase *hugely* the out-of-the-box applicability of PostgreSQL to
scientific applications. Perhaps such an effort could be co ordinated
with a physics based lexer and biology related lexer, to perhaps provide
a unified lexer that provided full scientific capabilities in the way
that PostGIS provides unified geospatial capabilities.

I don't know how best to bring such an effort about, but I do know that
if such a thing were created it would be a boon for PostgreSQL, giving
it a very significant leg up in terms of functionality, not to mention
the great positive impact that the wide, free availability of such a
tool would have on the scientific research community.

In response to

Re: using Tsearch2 for chemical text at 2007-07-25 22:51:15 from Tom Lane

Responses

Re: using Tsearch2 for chemical text at 2007-07-26 06:08:37 from Oleg Bartunov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Oleg Bartunov	2007-07-26 05:53:45	Re: using Tsearch2 for chemical text
Previous Message	Tom Lane	2007-07-26 05:05:19	Re: Re: invalid memory alloc request size 2147483648 using toode LIKE 'ä%'