Re: Combine Top-k with similarity search extensions

From: "=?UTF-8?Q?=E9=A9=AC=E4=BF=AE/=D0=9C=D0=B0=D1=82=D0=B2=D0=B5=D0=B9/Mateo/M?==?UTF-8?Q?att=20Buse?=" <mrbuseco(at)buseco(dot)net>
To: "Shmagi Kavtaradze" <kavtaradze(dot)s(at)gmail(dot)com>, tim(dot)child(at)comcast(dot)net
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Combine Top-k with similarity search extensions
Date: 2015-11-21 02:00:10
Message-ID: 20151120190010.1b404da78c54173e561a97484ed23969.a018702cca.wbe@email14.secureserver.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

<html><body><span style="font-family:Verdana; color:#000000; font-size:12pt;"><div>Dump to file, run shell script or C program to sort (sort -u).&nbsp; Searches and</div><div>comparisons work much better on sorted sets.</div><div><br></div><div>Matt<br></div>
<blockquote id="replyBlockquote" webmail="1" style="border-left: 2px solid blue; margin-left: 8px; padding-left: 8px; font-size:10pt; color:black; font-family:verdana;">
<div id="wmQuoteWrapper">
-------- Original Message --------<br>
Subject: Re: [NOVICE] Combine Top-k with similarity search extensions<br>
From: Shmagi Kavtaradze &lt;<a href="mailto:kavtaradze(dot)s(at)gmail(dot)com">kavtaradze(dot)s(at)gmail(dot)com</a>&gt;<br>
Date: Fri, November 20, 2015 8:13 am<br>
To: <a href="mailto:tim(dot)child(at)comcast(dot)net">tim(dot)child(at)comcast(dot)net</a><br>
Cc: <a href="mailto:pgsql-novice(at)postgresql(dot)org">pgsql-novice(at)postgresql(dot)org</a><br>
<br>
<div dir="ltr">It will add complexity and also no idea how to do it. Is there any alternative?</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 20, 2015 at 5:00 PM, <span dir="ltr">&lt;<a href="mailto:tim(dot)child(at)comcast(dot)net" target="_blank">tim(dot)child(at)comcast(dot)net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:Arial;font-size:12pt;color:#000000"><div>Shmagi,<br></div><div><br></div><div>Take the first 20&nbsp; text characters and compute and store the CRC32 or MD5&nbsp; of that value.&nbsp; That value acts as a signature. You can then find all distinct signatures,&nbsp; or all rows with duplicate signatures for further analysis&nbsp; You could event try building a signature on the full text string.<br></div><div><br></div><div><br></div><div><br></div><hr><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From: </b>"Shmagi Kavtaradze" &lt;<a href="mailto:kavtaradze(dot)s(at)gmail(dot)com" target="_blank">kavtaradze(dot)s(at)gmail(dot)com</a>&gt;<br><b>To: </b><a href="mailto:pgsql-novice(at)postgresql(dot)org" target="_blank">pgsql-novice(at)postgresql(dot)org</a><br><b>Sent: </b>Friday, November 20, 2015 2:21:36 AM<br><b>Subject: </b>[NOVICE] Combine Top-k with similarity search extensions<span class=""><br><div><br></div><div dir="ltr">I am performing similarity check over a column in a table with about 3500 entries. Column is populated with text data from text file. Performing a check results in 3500 * 3500 rows and it takes forever to calculate for my virtual machine. Is there any way to calculate for top-k results, to decrease amount and time needed? What I mean is that, for example when checking two sentences, if first several words does not match, to stop checking that sentences and move on.&nbsp;</div></span></div><div><br></div></div></div></blockquote></div><br></div>
</div>
</blockquote></span></body></html>

Attachment Content-Type Size
unknown_filename text/html 2.8 KB

Browse pgsql-novice by date

  From Date Subject
Next Message JORGE MALDONADO 2015-11-21 17:45:12 Advice about a parent-child relation design
Previous Message tim.child 2015-11-20 16:42:44 Re: Combine Top-k with similarity search extensions