Quick Links

BK-Tree Implementation on top of GiST

From:	Volkan YAZICI <yazicivo(at)ttnet(dot)net(dot)tr>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	BK-Tree Implementation on top of GiST
Date:	2007-10-28 11:56:34
Message-ID:	87bqajcv0t.fsf@ttnet.net.tr
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

In an address search framework of a company, we need to deal with
queries including potential spelling errors. After some external
address space constraints (e.g. match first letters, word length,
etc.) we're still ending up with a huge data set to filter through
Levenshtein like distance metrics.

Sequential scanning a record set with roughly 100,000 entries through
some sort of distance metric is not something we'd want in
production. For this purpose, I plan to implement BK-Trees[1] on top
of GiST, which will (technically) reduce overhead complexity from O(n)
to O(logn). As far as I'm concerned, such a work will worth the time
it will take when compared to overhead reduction it will bring.

[1] Some approaches to best-match file searching
http://portal.acm.org/citation.cfm?id=362003.362025

Anyway, I have some experience with source code of intarray
module. Does anybody have suggestions/warnings/comments about such a
project? Would PostgreSQL team welcome such a patch to get integrated
into fuzzystrmatch module, or should I create my own project at
pgfoundry?

BTW, as you'd imagine, related implementation won't be something
specific to Levenshtein. Any distance metric on any kind of data will
be able to benefit from BK-Trees.

Regards.

Responses

Re: BK-Tree Implementation on top of GiST at 2007-10-28 13:29:22 from Florian Weimer
Re: BK-Tree Implementation on top of GiST at 2007-10-28 17:01:01 from Oleg Bartunov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Florian Weimer	2007-10-28 13:29:22	Re: BK-Tree Implementation on top of GiST
Previous Message	Gokulakannan Somasundaram	2007-10-28 10:27:57	Re: [PATCHES] Including Snapshot Info with Indexes