Re: Matching on keyword or phrases within a field that is delimited with an "or" operator "|"

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Jim Ostler'" <jowenostler(at)yahoo(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Matching on keyword or phrases within a field that is delimited with an "or" operator "|"
Date: 2012-03-13 00:27:15
Message-ID: 009001cd00b0$0b5f9530$221ebf90$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


>> From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Jim Ostler
>> Sent: Monday, March 12, 2012 6:57 PM
>> To: pgsql-general(at)postgresql(dot)org
>> Subject: [GENERAL] Matching on keyword or phrases within a field that is
delimited with an "or" operator "|"
>>
>> I have a table that is around 20 GB, so I need to optimize as best as
possible the matching with another table on keywords across multiple fields.
I have around 10 fields that have keywords or phrases delimited with the
"or"
>> operator  "|". So it would be in the form of  "a | b  |  and jack  | cd"
. There are around 20 keywords or phrases per field, and these keywords
could be any word. 
>>
>> Because of the size of the database suing a "like" match would take too
long. I am not sure if tsvector would work, or if there is a way to indicate
how you want it parsed? 
>>
>> If I could index these fields somehow that would be best, but I don't
want to do the traditional full text indexing as I only want
to match whatever is between the " | " whether it is one word or more.
>>
>> The original use of this was as it appears, to have the field "a  |  b 
|  c" be read "a or b or c" etc. If there is a way to match using this type
of logic with an index that would be great. 
>>
>> I hope this is clear enough. Thanks for any help as I am fairly new at
this so any direction would be helpful.
>>
>> --Jim

=================================================================

Start with this:

SELECT 'a' = ANY(regexp_split_to_array('a|b|c', '\|')); -- In this query
the "ANY" is providing the OR capability; use "ALL" for AND

and adapt as needed.

Regular Expressions are friendly creatures - everybody should have at least
one.

Given the lack of an example, but functioning, query that currently does
what you want it is hard to provide suggestions on improvements. Whether
the above even is useful for you I do not know due to the lack of details.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bret Stern 2012-03-13 00:28:53 Calculated update
Previous Message Stefan Keller 2012-03-13 00:22:18 Re: Interesting article, Facebook woes using MySQL