Quick Links

Re: Combine Top-k with similarity search extensions

From:	tim(dot)child(at)comcast(dot)net
To:	Shmagi Kavtaradze <kavtaradze(dot)s(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org
Subject:	Re: Combine Top-k with similarity search extensions
Date:	2015-11-20 16:00:30
Message-ID:	455540989.1058519.1448035230053.JavaMail.zimbra@comcast.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

Shmagi,

Take the first 20 text characters and compute and store the CRC32 or MD5 of that value. That value acts as a signature. You can then find all distinct signatures, or all rows with duplicate signatures for further analysis You could event try building a signature on the full text string.

----- Original Message -----

From: "Shmagi Kavtaradze" <kavtaradze(dot)s(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Sent: Friday, November 20, 2015 2:21:36 AM
Subject: [NOVICE] Combine Top-k with similarity search extensions

I am performing similarity check over a column in a table with about 3500 entries. Column is populated with text data from text file. Performing a check results in 3500 * 3500 rows and it takes forever to calculate for my virtual machine. Is there any way to calculate for top-k results, to decrease amount and time needed? What I mean is that, for example when checking two sentences, if first several words does not match, to stop checking that sentences and move on.

In response to

Combine Top-k with similarity search extensions at 2015-11-20 10:21:36 from Shmagi Kavtaradze

Responses

Re: Combine Top-k with similarity search extensions at 2015-11-20 16:13:15 from Shmagi Kavtaradze

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Shmagi Kavtaradze	2015-11-20 16:13:15	Re: Combine Top-k with similarity search extensions
Previous Message	Shmagi Kavtaradze	2015-11-20 10:21:36	Combine Top-k with similarity search extensions