two questions about fulltext searchign / tsvector indexes

From: Jonathan Vanasco <postgres(at)2xlp(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: two questions about fulltext searchign / tsvector indexes
Date: 2014-06-10 00:55:01
Message-ID: 72D88361-2D6A-452F-A86B-6974E36C94EF@2xlp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm having some issues with fulltext searching.

I've gone though the list archives and stack overflow, but can't seem to get the exact answers. hoping someone can help.

Thanks in advance and apologies for these questions being rather basic. I just felt the docs and some online posts are leading me into possibly making the wrong decision and I want to make sure I"m doing this right.

1. I need to make both 'title' and 'description' searchable. What is the current proper way to index multiple columns of a table ( ie, not one ) ?

I've essentially seen the following in the docs, mailing list, and various websites:

A unified index
CREATE INDEX CONCURRENTLY unified_tsvector_idx ON mytable USING gin(to_tsvector('english', title || ' ' || description ));

Individual indexes
CREATE INDEX CONCURRENTLY title_tsvector_idx ON mytable USING gin(to_tsvector('english', title ));
CREATE INDEX CONCURRENTLY description_tsvector_idx ON mytable USING gin(to_tsvector('english', description ));

Using dedicated columns ( one or more )
ALTER TABLE ....
create trigger ....

I can't figure out which one to use. This is on a steadily growing table of around 20MM rows that gets 20-80k new records a day, but existing records are rarely updated.

2. I've been getting a handful of 'can not index words longer than 2047 characters' in my tests.

if this 2047 character max is on tokens, is there a way to lower it? or to profile the index for distribution of tokens ? I don't think we have to support any tokens larger than 20chars or so.

3a. What should EXPLAIN ANALYZE show if it is using the index ? i couldn't find an example.

3b. Depending on how I index the column, what do I need to pass into the query so that it uses the index ?

1. if the index is created like
gin(to_tsvector('english', title ));

do i have to search in this format ?
to_tsvector('english',title) @@ to_tsquery('english', 'dog') ;

2. if i use an index like
gin(to_tsvector('english', title || ' ' || description ));

what is the correct way to query the database and let the planner know I want to use the index ?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Khangelani Gama 2014-06-10 05:02:28 Re: pg_standby replication problem
Previous Message Adrian Klaver 2014-06-09 23:41:49 Re: pg_standby replication problem