| From: | Marius Andreiana <mandreiana(at)yahoo(dot)com> |
|---|---|
| To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
| Subject: | word/phrase extraction & ranking |
| Date: | 2012-11-14 18:34:10 |
| Message-ID: | 1352918050.97151.YahooMailNeo@web140704.mail.bf1.yahoo.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Hello,
From selected rows in a table, how can one extract and rank words/phrases based on how often they occur?
Here's an example: http://developer.yahoo.com/search/content/V1/termExtraction.html
INPUT:
CREATE TABLE phrases (
idBIGSERIAL,
phrase VARCHAR(10000));
INSERT INTO phrases (phrase) VALUES (‘Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration.’)
INSERT INTO phrases (phrase) VALUES (‘Andrea Bolgi was an italian sculptor’)
OUTPUT:
phrase | weight
italian sculptor | 5
virgin mary | 2
painters | 1
renaissance | 1
inspiration | 1
Andrea Bolgi | 1
Some notes:
* phrases could contain “stop words”, e.g. “easy to answer”
* ideally, english language variations and synonyms would be automatically grouped.
I understand one might use postgresql’s full text search support, and maybe pg_trgm, but how exactly?
Thanks
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Raymond O'Donnell | 2012-11-14 19:13:52 | Re: Using Postgresql 9.2 on windows 7 and windows vista |
| Previous Message | Adrian Klaver | 2012-11-14 17:23:12 | Re: Access disk from plpython |