Quick Links

Re: [PERFORM] Similarity search with the tsearch2 extension

From:	"Janek Sendrowski" <janek12(at)web(dot)de>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: [PERFORM] Similarity search with the tsearch2 extension
Date:	2013-12-06 16:21:13
Message-ID:	trinity-cf3ecc79-6b1d-4bbe-a706-5553ce2e50ca-1386346873185@3capp-webde-bs06
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-performance

Sorry, I used AND-statements instead of OR-statement in the example.
I notices that gin is much faster than gist, but I don't know why.

The query gets slow, because there are many non-stop words which appear very often in my sentences, like in 3% of all the sentences.
Do you think it could be worth it to filter the words, which appears that often and declare them as stop-words.
How would you split a sentence with let's say 10 non stop words to provide a performed similarity search?

There's still the problem with very short sentences. An partiel index on them with the trigram search might be the solution.
The pg_trgm module is far to slow for bigger setences, like you showed.

I thought I'll build a few partiel indexes on the string length, to enhance the performance.
Do you know some more improvements?

Janek Sendrowki

In response to

Re: Similarity search with the tsearch2 extension at 2013-12-06 14:54:35 from Kevin Grittner

Browse pgsql-general by date

	From	Date	Subject
Next Message	吕晓旭	2013-12-06 19:04:07	Re: Fwd: Help！Why CPU Usage and LoadAverage Jump up Suddenly
Previous Message	Tom Lane	2013-12-06 16:19:22	Re: Testing an extension without installing it

Browse pgsql-performance by date

	From	Date	Subject
Next Message	chidamparam muthusamy	2013-12-06 17:36:58	postgres performance
Previous Message	Kevin Grittner	2013-12-06 14:54:35	Re: Similarity search with the tsearch2 extension