I am performing similarity check over a column in a table with about 3500
entries. Column is populated with text data from text file. Performing a
check results in 3500 * 3500 rows and it takes forever to calculate for my
virtual machine. Is there any way to calculate for top-k results, to
decrease amount and time needed? What I mean is that, for example when
checking two sentences, if first several words does not match, to stop
checking that sentences and move on.