| From: | Sergey Konoplev <gray(dot)ru(at)gmail(dot)com> | 
|---|---|
| To: | Connor Wolf <wolf(at)imaginaryindustries(dot)com> | 
| Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> | 
| Subject: | Re: Understanding and implementing a GiST Index | 
| Date: | 2014-10-09 15:19:36 | 
| Message-ID: | CAL_0b1uzsmXPPtSGfR7WJcu=JE_POYAw=AxagDOeZbNUcYfG5w@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general pgsql-hackers | 
On Thu, Oct 9, 2014 at 12:09 AM, Connor Wolf
<wolf(at)imaginaryindustries(dot)com> wrote:
> I'm trying to implement a custom indexing system using the GiST index
> framework, and I have a few questions.
> Basically, I'm trying to implement a system that allows me to search across
> a set of 64 bit integers by hamming distance. This is intended to be used in
> searching for similar images, where the 64 bit integer is actually a phash
> of an image, and similarity between two images is reflected in the hamming
> distance between two integers.
Have you seen the smlar extension?
http://www.pgcon.org/2012/schedule/events/443.en.html
http://sigaev.ru/git/gitweb.cgi?p=smlar.git;a=blob;hb=HEAD;f=README
http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/
>
> Anyways, The appropriate approach here is to use something called a BK tree,
> for which I've written some test code and I think I have a decent grip of
> (my test code seems to work, in any event).
>
> That said:
>
> Is there a simple piece of example-code for implementing a GiST index for a
> single index? I've been looking through the /contrib/ directory,
> particularly the /contrib/btree_gist/ files, but it looks like 90% of the
> complexity there is related to supporting all the column types Postgres has,
> rather then actually tied to producing a functional index.
> Once I have something compiling, how can I check to be sure that I'm
> actually using the indexing module I created? I can enable my compiled
> extension using `CREATE EXTENSION`, but is there an option for `CREATE INDEX
> test_index ON testing USING gist (val);` that lets me specify *which* GiST
> index is actually employed? Is this even a valid question?
> This is probably something that's available in one of the system tables
> somewhere, but I haven't had much luck with google finding out where.
> Testing: What's the appropriate way to examine the generated tree structure
> of the index? I certainly went through a few bugs with my test tree system
> (and that was in python!). Is there any way to examine the index structure
> for debugging purposes?
> Also, it looks like `ereport()` is the proper way to emit debugging
> information. Is this correct?
> In that vein, is there any way to have information that is on the operations
> of an entire query? Looking at the number of tree nodes touched for a scan
> would be nice (and I would not be surprised if there is already a facility
> for it).
>
> Project code is here if anyone is interested, any help would be great. I
> have very little idea what I'm doing.
>
> Thanks,
> Connor
-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Janes | 2014-10-09 16:55:03 | Re: Optimal checkpoint_setting | 
| Previous Message | hari.fuchs | 2014-10-09 12:12:25 | Re: Converting char to varchar automatically | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2014-10-09 15:21:07 | Re: [9.4 bug] The database server hangs with write-heavy workload on Windows | 
| Previous Message | Tom Lane | 2014-10-09 15:08:37 | Re: Proposal for better support of time-varying timezone abbreviations |