From: | Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | tsearch2 for alphabetic character strings & codes |
Date: | 2005-09-23 21:43:39 |
Message-ID: | Pine.LNX.4.58.0509231355220.1550@greenie.cheapcomplexdevices.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I'm looking for a way search for substrings strings within
documents in a way very similar to tsearch2, but my strings
are not alphabetical codes so I'm having a tough time
trying to use the current tsearch2 configurations with them.
For example, using tsearch to search for codes like
'31.03(e)(2)(A)'
in a set of documents is tricky because tsearch seems
to treat most of the punctuation as word separators.
fli=# select
fli-# to_tsvector('default','31.03(e)(2)(A)'),
fli-# to_tsvector('simple','31.03(e)(2)(A)');
to_tsvector | to_tsvector
-----------------------+-----------------------------
'2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
(1 row)
I see that tsearch2 allows different "configurations"
that appaently differ in how they parse strings.
I guess what I'm looking for is a "configuration"
that's even simpler-than-simple, and only breaks
up strings on whitespace and doesn't use any natural
language dictionaries. I was hoping I could download
or define such a configuration; but didn't see any
obvious documentation on how to set up my own
configuration.
Does this sound like a good approach (and if so, could
someone please point me in the right direction), or
are there other things I should be looking to.
Ron
From | Date | Subject | |
---|---|---|---|
Next Message | Tony Wasson | 2005-09-23 21:46:52 | Re: Getting the amount of overlap when using OVERLAPS |
Previous Message | Robert Treat | 2005-09-23 20:40:32 | Re: How many insert + update should one transaction |