Quick Links

Re: 8.3 full text search docs

From:	Albert Cervera i Areny <albert(at)nan-tic(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	andy <andy(at)squeakycode(dot)net>
Subject:	Re: 8.3 full text search docs
Date:	2007-10-13 22:24:49
Message-ID:	200710140024.49931.albert@nan-tic.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Andy,
note that documentation is discussed in the pgsql-docs list and patches
usually are submitted to the pgsql-patches list. Nice to see both new
sections, by the way.

A Diumenge 14 Octubre 2007, andy va escriure:
> I have two doc updates I'd like to offer. I see we have two example
> sections: creating rule-based dict's and creating parsers. When I was
> starting I would have liked to see an example usage.
>
> I'd like to offer: "example usage" and "Upgrading".
> This is my first draft, if anyone has suggestions I'd be interested in
> hearing them. Also, I'm not sure where or who to send this to, so I
> hope -hacker is ok.
>
> ----- Example Usage -----
> Staring a new project with Full Text Searching is easy. There is
> nothing to install anymore, its all built in (in fact, don't install the
> contrib module tsearch2 because it will conflict with the tsearch2 built
> into the core).
>
> We need to add a new column of type tsvector to the table you'd like to
> search. In this example we'll use a table called notes. If your table
> exists use:
>
> alter table notes add searchvec tsvector;
>
> If not use:
>
> create table notes (
> rowid integer,
> note text,
> searchvec tsvector
> );
>
> The searchvec column is what we will use for searching, so you probably
> want to create an index on it... from another place in the manual:
>
> (http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)
>
> GiST indexes are very good for dynamic data and fast if the number of
> unique words (lexemes) is under 100,000,
> GIN indexes are best for static data with +100,000 lexemes.
>
> For our example, I'll create a gist index:
>
> create index notesvec on notes using gist(searchvec);
>
> If you have existing data, we need to fill the searchvec column:
>
> update notes set searchvec = to_tsvector(note);
>
> After the update, any rows are inserted or updated will not have their
> searchvec column set automatically, for that we need to create a trigger:
>
> create trigger notevecupdate
> before insert or update on notes
> for each row
> execute procedure
> tsvector_update_trigger(searchvec, 'pg_catalog.english', note);
>
>
> Some data:
> insert into notes(rowid, note) values(1, 'this is a test');
>
> insert into notes(rowid, note)
> values(2, 'I do not like green eggs and ham');
>
> insert into notes(rowid, note) values(3, 'the cat in the hat');
>
> insert into notes(rowid, note)
> values(4, 'rage against the dying of the light');
>
> And now we can query it:
>
> select * from notes where searchvec @@ to_tsquery('light');
>
> or
>
> select * from notes, to_tsquery('test') as q where searchvec @@ q;
>
> writing it this way lets you reuse the tsquery "q" like this:
>
> select note, ts_rank(searchvec, q)
> from notes, to_tsquery('test') as q
> where searchvec @@ q
> order by ts_rank(searchvec, q);
> ----- Example Usage -----
>
>
>
> ----- Upgrade from prior versions -----
>
> When tsearch2 was put into core, some functions and types were renamed,
> among other things. A simple backup and restore will not work to
> migrate your database from versions below 8.3 to 8.3.
>
> In general, the way to do it is backup all your data without the
> tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff
> by hand.
>
> (Its easier to think of this as an upgrade from tsearch2 to tsearch3,
> but without the whole renaming it to tsearch3 thing)
>
> To make it a little easier, there is a way using the pg_restore to
> selective restore everything that is not in the old tsearch2.
>
> First you must use "pg_dump -Fc" to backup your existing database.
>
> Then we will create an item list of things we want to restore using this
> perl script. It will strip out all the things that look like tsearch2,
> and return (to stdout) a list of things you should restore.
>
> For example:
>
> pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
> perl maketoc.pl ubberbase.bak > toc
> # now restore just the items in toc
> pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak
>
> There is one thing that will fail, that's the trigger you used to update
> the tsvector column. Its because the function tsearch2 was renamed to
> tsvector_update_trigger. You'll need to recreate the trigger by hand.
>
> Now that the structures and data are restored, you'll need to go through
> and redo the tsearch2 stuff by hand.
>
> After you get the database fixed up, you'll also need to update your
> client programs (php, perl, etc). For the most part, just renameing
> things (like rank to ts_rank) should be all that's required.
>
>
> Oleg Bartunov has an incomplete list of items that have been renamed:
>
> http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes
>
> ----- Upgrade from prior versions -----

--
Albert Cervera i Areny
http://www.NaN-tic.com

In response to

8.3 full text search docs at 2007-10-13 22:00:06 from andy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-10-13 23:09:22	Re: Seems we need a post-beta1 initdb already
Previous Message	Gregory Stark	2007-10-13 22:02:16	ABIs are hard