From: | Eric Ridge <ebr(at)tcdi(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Postgres + Xapian (was Re: fulltext searching via a custom index type ) |
Date: | 2004-01-05 16:00:36 |
Message-ID: | 4CF83855-3F98-11D8-ADB4-000A95BB5944@tcdi.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Jan 2, 2004, at 4:54 PM, Alvaro Herrera wrote:
> I think your approach is too ugly. You will have tons of problems the
> minute you start thinking about concurrency (unless you want to allow
> only a single user accessing the index)
It might be ugly, but it's very fast. Surprisingly fast, actually.
Concerning concurrency, Xapian internally supports multiple readers and
only 1 concurrent writer. So the locking requirements should be far
less complex than a true concurrent solution. Now, I'm not arguing
that this ideal, but if Xapian is a search engine you're interested in,
then you've already made up your mind that you're willing to deal with
1 writer at a time.
However, Xapian does have built-in support for searching multiple
databases at once. One thought I've had is to simply create a new
1-document database on every INSERT/UPDATE beyond the initial CREATE
INDEX. Then whenever you do an index scan, tell Xapian to use all the
little databases that exist in the index. This would give some bit of
concurrency. Then on VACUUM (or FULL), all these little databases
could be merged back into the main index.
> and recovery (unless you want to force users to REINDEX when the
> system crashes).
I don't yet understand how the WAL stuff works. I haven't looked at
the API's yet, but if something you can record is "write these bytes to
this BlockNumber at this offset", or if you can say, "index Tuple X
from Relation Y", then it seems like recovery is still possible.
If ya can't do any of that, then I need to go look at WAL further.
> I think one way of attacking the problem would be using the existing
> nbtree by allowing it to store the five btrees. First read the README
> in the nbtree dir, and then poke at the metapage's only structure. You
> will see that it has a BlockNumber to the root page of the index.
Right, I had gotten this far in my investigation already. The daunting
thing about trying to use the nbtree code, is the a code itself. It's
very complex. Plus, I just don't know how well the rest of Xapian
would respond to all of a sudden having a concurrent backend. It's
likely that it would make no difference, but it's just an unknown to me
at this time.
> Try modifying that to make it have a BlockNumber to every index's root
> page.
> You will have to provide ways to access each root page and maybe other
> nonstandard things (such as telling the root split operation what root
> page are you going to split), but you will get recovery and concurrency
> (at least to a point) for free.
And I'm not convinced that recovery and concurrency would be "for free"
in this case either. The need to keep essentially 5 different trees in
sync greatly complicates the concurrency issue, I would think.
thanks for your time!
eric
From | Date | Subject | |
---|---|---|---|
Next Message | Glenn Wiorek | 2004-01-05 16:13:11 | Re: [HACKERS] Announce: Search PostgreSQL related resources |
Previous Message | Dave Cramer | 2004-01-05 15:51:34 | Re: [HACKERS] Announce: Search PostgreSQL related resources |
From | Date | Subject | |
---|---|---|---|
Next Message | Glenn Wiorek | 2004-01-05 16:13:11 | Re: [HACKERS] Announce: Search PostgreSQL related resources |
Previous Message | Tom Lane | 2004-01-05 16:00:30 | Re: Proposed Query Planner TODO items |