Quick Links

Re: tsvector limitations

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Mark Johnson" <mark(at)remingtondatabasesolutions(dot)com>
Cc:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Tim" <elatllat(at)gmail(dot)com>, pgsql-admin(at)postgresql(dot)org, "Greg Williamson" <gwilliamson39(at)yahoo(dot)com>
Subject:	Re: tsvector limitations
Date:	2011-06-15 18:31:28
Message-ID:	18955.1308162688@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

"Mark Johnson" <mark(at)remingtondatabasesolutions(dot)com> writes:
> When this discussion first started, I immediately thought about people
> who full text index their server's log files. As a test I copied
> /var/log/messages to $PGDATA and then used the same pg_read_file()
> function you mentioned earlier to pull the data into a column of type
> text. The original file was 4.3 MB, and the db column had length
> 4334920 and the function pg_column_size reported a size of 1058747. I
> then added a column named tsv of type tsvector, and populated it using
> to_tsvector(). The function pg_column_size reported 201557. So in this
> test a 4.2 MB text file produced a tsvector of size 200 KB. If this
> scales linearly,

... which it won't. There is no real-world full text indexing
application where there aren't many duplications of words. (The OP
eventually admitted that his "test case" was a dictionary word list
and not an actual document.) Any discussion of required tsvector
sizes that doesn't account for the actual, nonlinear scaling behavior
isn't worth the electrons it's printed on.

regards, tom lane

In response to

Re: tsvector limitations at 2011-06-15 17:28:41 from Mark Johnson

Responses

Re: tsvector limitations at 2011-06-15 20:38:12 from Oleg Bartunov
Re: tsvector limitations at 2011-06-16 02:34:01 from Tim

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Oleg Bartunov	2011-06-15 20:38:12	Re: tsvector limitations
Previous Message	Mark Johnson	2011-06-15 17:28:41	Re: tsvector limitations