From: | Rémi Cura <remi(dot)cura(at)gmail(dot)com> |
---|---|
To: | Janek Sendrowski <janek12(at)web(dot)de> |
Cc: | PostgreSQL General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Similarity search for sentences |
Date: | 2013-12-05 12:12:55 |
Message-ID: | CAJvUf_tb_bdk4nCMHMfRB2XvFTUxzYW0ho6kL5ALGp2y3nKxvg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
May be totally a bad idea :
explode your sentence into(sentence_number, one_word), n times , (makes a
big table, you may want to partition)
then, classic index on sentence number, and on the one world (btree if you
make = comparison , more subtel if you do "like 'word' ")
depending on perf, it could be wort it to regroup by words :
sentence_number[], on_word
Then you could try array or hstore on sentence_number[] ?
Cheers,
Rémi-C
2013/12/5 Janek Sendrowski <janek12(at)web(dot)de>
> Hi,
>
> I have tables with millions of sentences. Each row contains a sentence. It
> is natural language and every language is possible, but the sentences of
> one table have the same language.
> I have to do a similarity search on them. It has to be very fast,
> because I have to search for a few hundert sentences many times.
> The search shouldn't be context-based. It should just get sentences with
> similar words(maybe stemmed).
>
> I already had a try with gist/gin-index-based trigramm search (pg_trgm
> extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing
> (Fixed Query Array), but it's all to slow or not suitable.
> Soundex and Metaphone aren't suitable, as well.
>
> I'm already working on this project since a long time, but without any
> success.
> Do any of you have an idea?
>
> I would be very thankful for help.
>
> Janek Sendrowski
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ladislav Lenart | 2013-12-05 14:06:42 | [Q] Update from a table |
Previous Message | Janek Sendrowski | 2013-12-05 11:51:55 | Similarity search for sentences |