From: | Connor Wolf <wolf(at)imaginaryindustries(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Understanding and implementing a GiST Index |
Date: | 2014-10-10 03:09:07 |
Message-ID: | 54374DD3.40308@imaginaryindustries.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
I had skimmed the presentation slides, but I hadn't looked that closely
because it appeared to be using cosine similarity metrics, and only
operated on matrices, neither of which are what I wanted.
On closer examination, I think I could explode my packed hash values to
a matrix. I'm not sure how the cosine similarity metric would work given
that the arrays would only contain the values either 0 or 1 (as my hash
value is fundamentally a integer of configurable length (you can alter
the hash size, I'm using 64 bits)).
I'll check out the code and poke it a bit.
I'll probably also move this discussion to the hackers mailing list (the
instructions say to ask here first, but Oleg suggested I go there, and I
generally agree).
Connor
On 10/9/2014 8:19 AM, Sergey Konoplev wrote:
> On Thu, Oct 9, 2014 at 12:09 AM, Connor Wolf
> <wolf(at)imaginaryindustries(dot)com> wrote:
>> I'm trying to implement a custom indexing system using the GiST index
>> framework, and I have a few questions.
>> Basically, I'm trying to implement a system that allows me to search across
>> a set of 64 bit integers by hamming distance. This is intended to be used in
>> searching for similar images, where the 64 bit integer is actually a phash
>> of an image, and similarity between two images is reflected in the hamming
>> distance between two integers.
> Have you seen the smlar extension?
>
> http://www.pgcon.org/2012/schedule/events/443.en.html
> http://sigaev.ru/git/gitweb.cgi?p=smlar.git;a=blob;hb=HEAD;f=README
> http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/
>
>> Anyways, The appropriate approach here is to use something called a BK tree,
>> for which I've written some test code and I think I have a decent grip of
>> (my test code seems to work, in any event).
>>
>> That said:
>>
>> Is there a simple piece of example-code for implementing a GiST index for a
>> single index? I've been looking through the /contrib/ directory,
>> particularly the /contrib/btree_gist/ files, but it looks like 90% of the
>> complexity there is related to supporting all the column types Postgres has,
>> rather then actually tied to producing a functional index.
>> Once I have something compiling, how can I check to be sure that I'm
>> actually using the indexing module I created? I can enable my compiled
>> extension using `CREATE EXTENSION`, but is there an option for `CREATE INDEX
>> test_index ON testing USING gist (val);` that lets me specify *which* GiST
>> index is actually employed? Is this even a valid question?
>> This is probably something that's available in one of the system tables
>> somewhere, but I haven't had much luck with google finding out where.
>> Testing: What's the appropriate way to examine the generated tree structure
>> of the index? I certainly went through a few bugs with my test tree system
>> (and that was in python!). Is there any way to examine the index structure
>> for debugging purposes?
>> Also, it looks like `ereport()` is the proper way to emit debugging
>> information. Is this correct?
>> In that vein, is there any way to have information that is on the operations
>> of an entire query? Looking at the number of tree nodes touched for a scan
>> would be nice (and I would not be surprised if there is already a facility
>> for it).
>>
>> Project code is here if anyone is interested, any help would be great. I
>> have very little idea what I'm doing.
>>
>> Thanks,
>> Connor
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Connor Wolf | 2014-10-10 07:03:56 | Re: [GENERAL] Understanding and implementing a GiST Index |
Previous Message | Adrian Klaver | 2014-10-09 22:09:11 | Re: PostgreSQL server won't start, corrupt? |
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Rijkers | 2014-10-10 04:34:34 | Re: [HACKERS] schema-only -n option in pg_restore fails |
Previous Message | Peter Geoghegan | 2014-10-10 01:35:33 | Re: Last Commitfest patches waiting review |