| From: | Joanna Sharman <Joanna(dot)Sharman(at)ed(dot)ac(dot)uk> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | HTML tags and tsearch2 | 
| Date: | 2008-06-26 11:11:58 | 
| Message-ID: | 20080626121158.lb0dui10gg44ck40@www.staffmail.ed.ac.uk | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Hi,
I have recently started experimenting with tsearch2 and it seems that  
the default behaviour is to ignore HTML tags and treat them as  
word-separators. What I would like it to do is to ignore HTML tags  
within words, but instead of creating separate words, combine the  
characters separated by the tag into one word.
For example: in the database I have words like 'K<sub>ir</sub>' that  
need to be searched using the term without HTML tags, i.e. 'Kir'.  
Currently, the HTML tags are ignored and two words are stored in the  
vector, 'k' and 'ir'. I would like only one word, 'kir', to be stored  
in the vector, so that searches using the word 'kir' will match the row.
A second, related question is whether it is possible to cause tsearch2  
to split up words when it encounters digits, e.g. 'TM8' into 'TM' and  
'8'.
I am not sure if this functionality is possible to implement using  
tsearch2 or if there might be a better way, so I would be grateful for  
any advice or pointers to further reading on how I might do this. (I  
am using PostgreSQL version 8.1.10)
Many thanks in advance,
Joanna
-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Oleg Bartunov | 2008-06-26 12:05:09 | Re: HTML tags and tsearch2 | 
| Previous Message | Dean Rasheed | 2008-06-26 10:08:16 | Re: what are rules for? |