From: | Patrick Dung <patrick_dkt(at)yahoo(dot)com(dot)hk> |
---|---|
To: | Pgsql-general General <pgsql-general(at)postgresql(dot)org> |
Subject: | Find similar records (compare tsvectors) |
Date: | 2015-03-02 14:57:56 |
Message-ID: | 514327739.917376.1425308276228.JavaMail.yahoo@mail.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
I had a database with articles or attachment stored in bytea format.I also had a trigger: it insert/update the tsv column when a record is added/updated.The tsv column had a GIN index.With this setting, I can do very fast keyword search on the tsv.
Suppose I had a specific record (id=100000).How to list similar records based on ranking?In that case, I had to compare a tsvector with another tsvector.
I had this SQL which make the original tsv as a text and then to tsquery, Then I can compare a tsv and a tsquery.
SELECT ts_rank(i.tsv, replace(strip(original.tsv)::text, ' ', '|')::tsquery) as similarity, i.company, i.industry, i.post_timestamp, i.id FROM items i, (SELECT tsv, id FROM items WHERE id=100000) AS original WHERE i.id != original.id ORDER BY similarity;
items table:id bigint
company varchar
industry varchardescription varcharpost_timestamp timestampattachment bytea
tsv tsvector
The problem is that this is very slow.Any comment?
Thank and regards,Patrick
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2015-03-02 15:21:46 | Re: The slave suddenly stopped with such DB log : "will not overwrite a used ItemId" and "heap_insert_redo: failed to add tuple" |
Previous Message | Ryan King | 2015-03-02 13:40:40 | Re: Regarding "Point-in-time Recovery" feature |