Re: Similarity search for sentences

From: Rémi Cura <remi(dot)cura(at)gmail(dot)com>
To: Janek Sendrowski <janek12(at)web(dot)de>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Similarity search for sentences
Date: 2013-12-05 12:12:55
Message-ID: CAJvUf_tb_bdk4nCMHMfRB2XvFTUxzYW0ho6kL5ALGp2y3nKxvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

May be totally a bad idea :
explode your sentence into(sentence_number, one_word), n times , (makes a
big table, you may want to partition)
then, classic index on sentence number, and on the one world (btree if you
make = comparison , more subtel if you do "like 'word' ")

depending on perf, it could be wort it to regroup by words :
sentence_number[], on_word
Then you could try array or hstore on sentence_number[] ?

Cheers,

Rémi-C

2013/12/5 Janek Sendrowski <janek12(at)web(dot)de>

> Hi,
>
> I have tables with millions of sentences. Each row contains a sentence. It
> is natural language and every language is possible, but the sentences of
> one table have the same language.
> I have to do a similarity search on them. It has to be very fast,
> because I have to search for a few hundert sentences many times.
> The search shouldn't be context-based. It should just get sentences with
> similar words(maybe stemmed).
>
> I already had a try with gist/gin-index-based trigramm search (pg_trgm
> extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing
> (Fixed Query Array), but it's all to slow or not suitable.
> Soundex and Metaphone aren't suitable, as well.
>
> I'm already working on this project since a long time, but without any
> success.
> Do any of you have an idea?
>
> I would be very thankful for help.
>
> Janek Sendrowski
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ladislav Lenart 2013-12-05 14:06:42 [Q] Update from a table
Previous Message Janek Sendrowski 2013-12-05 11:51:55 Similarity search for sentences