Quick Links

HTML tags and tsearch2

From:	Joanna Sharman <Joanna(dot)Sharman(at)ed(dot)ac(dot)uk>
To:	pgsql-general(at)postgresql(dot)org
Subject:	HTML tags and tsearch2
Date:	2008-06-26 11:11:58
Message-ID:	20080626121158.lb0dui10gg44ck40@www.staffmail.ed.ac.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

I have recently started experimenting with tsearch2 and it seems that
the default behaviour is to ignore HTML tags and treat them as
word-separators. What I would like it to do is to ignore HTML tags
within words, but instead of creating separate words, combine the
characters separated by the tag into one word.

For example: in the database I have words like 'K<sub>ir</sub>' that
need to be searched using the term without HTML tags, i.e. 'Kir'.
Currently, the HTML tags are ignored and two words are stored in the
vector, 'k' and 'ir'. I would like only one word, 'kir', to be stored
in the vector, so that searches using the word 'kir' will match the row.

A second, related question is whether it is possible to cause tsearch2
to split up words when it encounters digits, e.g. 'TM8' into 'TM' and
'8'.

I am not sure if this functionality is possible to implement using
tsearch2 or if there might be a better way, so I would be grateful for
any advice or pointers to further reading on how I might do this. (I
am using PostgreSQL version 8.1.10)

Many thanks in advance,
Joanna

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Responses

Re: HTML tags and tsearch2 at 2008-06-26 12:05:09 from Oleg Bartunov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Oleg Bartunov	2008-06-26 12:05:09	Re: HTML tags and tsearch2
Previous Message	Dean Rasheed	2008-06-26 10:08:16	Re: what are rules for?