From: | Ben <bench(at)silentmedia(dot)com> |
---|---|
To: | "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com> |
Cc: | Postgresql-General mailing list <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: scoring differences between bitmasks |
Date: | 2005-10-02 18:49:03 |
Message-ID: | F9149CFE-05C3-40D1-A6D8-58592A796B6E@silentmedia.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Just the number of bits, not which ones. Basically, the hamming
distance.
On Oct 2, 2005, at 11:44 AM, Todd A. Cook wrote:
> Hi,
>
> It may be that I don't understand your problem. :)
>
> Are you searching the table for the closest vector? If so, is
> "closeness" defined only as the number of bits that are different?
> Or, do you need to know which bits as well?
>
> -- todd
>
>
> Ben wrote:
>
>> Hrm, I don't understand. Can you give me an example with some
>> reasonably sized vectors?
>> On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote:
>>
>>> Hi,
>>>
>>> Try breaking the vector into 4 bigint columns and building a
>>> multi- column
>>> index, with index columns going from the most evenly distributed
>>> to the
>>> least. Depending on the distribution of your data, you may only
>>> need 2
>>> or 3 columns in the index. If you can cluster the table in that
>>> order,
>>> it should be really fast. (This structure is a tabular form of
>>> a linked
>>> trie.)
>>>
>>> -- todd
>>>
>>>
>>> Ben wrote:
>>>
>>>
>>>> Yes, that's the straightforward way to do it. But given that
>>>> my vectors are 256 bits in length, and that I'm going to
>>>> eventually have about 4 million of them to search through, I
>>>> was hoping greater minds than mine had figured out how to do
>>>> it faster, or how compute some kind of indexing....... somehow.
>>>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Todd A. Cook | 2005-10-02 19:14:12 | Re: scoring differences between bitmasks |
Previous Message | Todd A. Cook | 2005-10-02 18:44:37 | Re: scoring differences between bitmasks |