From: | Connor Wolf <wolf(at)imaginaryindustries(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Understanding and implementing a GiST Index |
Date: | 2014-10-09 07:09:58 |
Message-ID: | 543634C6.5000005@imaginaryindustries.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Hi there!
I'm trying to implement a custom indexing system using the GiST index
framework, and I have a few questions.
Basically, I'm trying to implement a system that allows me to search
across a set of 64 bit integers by hamming distance. This is intended to
be used in searching for similar images, where the 64 bit integer is
actually a phash
<http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html> of
an image, and similarity between two images is reflected in the hamming
distance between two integers.
Anyways, The appropriate approach here is to use something called a BK
tree
<http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees>,
for which I've written some test code
<https://github.com/fake-name/MangaCMS/blob/master/deduplicator/hamDb.py#L27>
and I think I have a decent grip of (my test code seems to work, in any
event).
That said:
* Is there a /simple/ piece of example-code for implementing a GiST
index for a single index? I've been looking through the /contrib/
directory, particularly the /contrib/btree_gist/ files, but it looks
like 90% of the complexity there is related to supporting all the
column types Postgres has, rather then actually tied to producing a
functional index.
* Once I have something compiling, how can I check to be sure that I'm
actually using the indexing module I created? I can enable my
compiled extension using `CREATE EXTENSION`, but is there an option
for `CREATE INDEX test_index ON testing USING gist (val);` that lets
me specify *which* GiST index is actually employed? Is this even a
valid question?
This is probably something that's available in one of the system
tables somewhere, but I haven't had much luck with google finding
out where.
* Testing: What's the appropriate way to examine the generated tree
structure of the index? I certainly went through a few bugs with my
test tree system (and that was in python!). Is there any way to
examine the index structure for debugging purposes?
* Also, it looks like `ereport()` is the proper way to emit debugging
information. Is this correct?
* In that vein, is there any way to have information that is on the
operations of an entire query? Looking at the number of tree nodes
touched for a scan would be nice (and I would not be surprised if
there is already a facility for it).
Project code is here <https://github.com/fake-name/pg-spgist_hamming> if
anyone is interested, any help would be great. I have very little idea
what I'm doing.
Thanks,
Connor
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Joseph Krogh | 2014-10-09 07:58:43 | Re: Importing large object (with lo_import) to table in separate tablespaces takes up lots of space in $PGDATA |
Previous Message | Andrus | 2014-10-09 05:41:52 | Re: Converting char to varchar automatically |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2014-10-09 07:38:01 | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
Previous Message | Heikki Linnakangas | 2014-10-09 06:47:45 | Re: What exactly is our CRC algorithm? |