From: | mlw <markw(at)mohawksoft(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hackers List <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: OK, does anyone have any better ideas? |
Date: | 2000-12-09 13:18:10 |
Message-ID: | 3A323112.FCF35B3B@mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-novice |
Oleg Bartunov wrote:
>
> We need multi-key B-tree like index for such problem.
> Our full text search engine is based on arrays and we need to find quickly
> is some number exists in array - some kind of index over int array.
> We're currently testing GiST approach and seems will have some conclusions
> soon. I think multi-key B-tree like index would be better in my
> opinion, but this requires to much work and knowledge of postgres's internals.
> Yesterday I read about UBTree, seems like it's good for index and query
> sets. Currently postgres has no set specific methods.
The way I do my search indexing is with bitmap objects and a word
dictionary. One creates a searchable dictionary of words by scanning the
selected records. So, in one query that results in 2 million records,
the total aggregate number of words is about 60,000 depending on how you
parse. For each word, you create a "bitmap object" (in one of a few
forms) where bit '0' represents the first record, bit '1' represents the
second, and so on, until you have 2 million bits.
Set the correct bit in the bitmap for each document that contains that
word. In the end you will have the equivalent 60,000 bitmaps or 2
million bits.
During search time, one creates an empty bitmap of 2 million bits as a
work space. One parses the search term, and performs boolean operation
on the workspace from the bitmap retrieved for each word.
When you are done parsing, you have a bitmap with a bit set for each
document that fits the search criteria. You then enumerate the bits by
bit position, and you now have a list of document numbers.
If only I could get the list of document numbers back into
postgres....... It would be great.
--
http://www.mohawksoft.com
From | Date | Subject | |
---|---|---|---|
Next Message | Hiroshi Inoue | 2000-12-09 13:23:09 | RE: [HACKERS] Japan pictures |
Previous Message | Oleg Bartunov | 2000-12-09 12:40:53 |
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2000-12-09 13:41:37 | Re: OK, does anyone have any better ideas? |
Previous Message | Oleg Bartunov | 2000-12-09 12:40:53 |