From: | "Gene Selkov Jr(dot)" <selkovjr(at)mcs(dot)anl(dot)gov> |
---|---|
To: | Tony(dot)Curtis(at)vcpc(dot)univie(dot)ac(dot)at |
Cc: | pgsql-sql(at)postgreSQL(dot)org |
Subject: | Re: [SQL] [SQL, regex, words] how to match word boundaries using regex's? |
Date: | 1998-10-19 13:33:23 |
Message-ID: | 199810191432.JAA07847@antares.mcs.anl.gov |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
> I want to do a regex match limited to words.
>
> I tried this:
>
> where ... ~ '\Wword\W';
> where ... ~ '\W*word\W*';
> where ... ~ '\b\(word\)\b';
>
> and other things with LIKE but no joy.
Based on the comments in the source, regexp stuff used in postgres is something like this: http://tiger8.com/us/regexp-manpage.html
I guess there are no backslash macros is POSIX expressions. No joy. By the way, I am wondering what determined the choice of the regexp machine for postgres? Is it performance-related? Is it possible to have the same stuff as in perl?
As to your question, how about a poor man's Altavista like this:
Split the text into words before loading into a special index table. Words are numbered sequentially, so you can search for "phrases":
Table = word
+----------------------------------+----------------------------------+-------+
| Field | Type | Length|
+----------------------------------+----------------------------------+-------+
| rec | char() | 12 |
| seq | int4 | 4 |
| word | text | var |
+----------------------------------+----------------------------------+-------+
SELECT DISTINCT w1.rec
FROM word w1, word w2
WHERE
w1.word ~ '^a$'
AND w2.word ~ '^phrase$'
AND w1.rec = w2.rec
AND w2.seq - w1.seq = 1; -- Distance between the words
This way, you can control what represents the concept of a 'word' by an external program (perl script, etc.)
Certainly, this method will show suboptimal performance with extra large tables and more than three or four words in a seach phrase. But it is possible to optimise by delegating set operations (joins) and position arithmetic to the client. It works very well for my ~500k tables and the most common queries.
--Gene
From | Date | Subject | |
---|---|---|---|
Next Message | Tony.Curtis | 1998-10-19 14:58:58 | Re: [SQL] [SQL, regex, words] how to match word boundaries using regex's? |
Previous Message | Paul Friberg CEO - ISTI | 1998-10-19 13:03:39 | PostgreSQL COPY command |