Re: Simplifying the tsvector format for simple glossaries

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Simplifying the tsvector format for simple glossaries
Date: 2012-01-29 22:06:34
Message-ID: Pine.LNX.4.64.1201300204110.12612@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Always use strip(to_tsvector()) and you'll be happy. Stop words affect
tsvector size, so if you don't search them, don't store.

strip() function described in docs.

Oleg

On Sun, 29 Jan 2012, Marc Mamin wrote:

>
> Hello,
>
> We have a text search on data from error logs, and our application
> offer a rather simple search on lexemes only (no weighting, no
> neighbouring ...).
> This works quite well, except when the applications generating the logs
> get mad and we have to handle millions of messages per day :-)
> We also have an ETL (perl) tool, that first transform the logs to CSV
> files for COPY
>
> My idea is to let perl create a list of single words for each message,
> and run the search only on these "glossaries".
> Going further, I'd like to import these lists directly as tsvectors to
> save a processing step within Postgres.
>
> The standard tsvector representation in CSV would then look like
>
> 'lex_1':1 'lex_2':2 'lex_3':3 ...
>
> when casting from text to tsvector, I've notice with 9.1 that this simpler format is valid too:
>
> 'lex_1 lex_2 lex_3 ...'
>
> So my questions:
> Is it safe to define tsvectors that way, or should I expect problems
> with future release being stricter with the tsvector format?
>
> Do I have to respect the lexemes ordering within a tsvector (using which
> NLS Format) ?
>
> Is it an issue if some tsvectors contains stop words, or is it just
> annoying noise ?
>
> For the case when this simplification is fine, I'd suggest to add a
> description on this possible tsvector representation to the doc.
>
> best regards,
>
> Marc Mamin
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adam Rich 2012-01-29 22:19:38 Interval ordering
Previous Message David Fetter 2012-01-29 21:28:17 Re: FOSDEM booth volunteer