From: | Tim <elatllat(at)gmail(dot)com> |
---|---|
To: | Mark Johnson <mark(at)remingtondatabasesolutions(dot)com> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: tsvector limitations |
Date: | 2011-06-14 01:58:54 |
Message-ID: | BANLkTiniXCCAdwD0qDXb3mqLSSQrzqKSgQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Mark,
That link is a mirror of this mailing list; it's not from 5 months ago.
If you are in the year 2012 please respond with lottery numbers and the
like.
On Mon, Jun 13, 2011 at 9:43 PM, Mark Johnson <
mark(at)remingtondatabasesolutions(dot)com> wrote:
>
>
> I found another post where you asked the same questions 5 months ago. Have
> you tested in that time?
> http://www.spinics.net/lists/pgsql-admin/msg19438.html
>
>
> A text search vector is an array of distinct lexemes (less any stopwords)
> and their positions. Taking your example we get ...
>
> select to_tsvector('the lord of the rings.txt') "answer";
> answer
> -------------------
> 'lord':2, 'rings.txt':5
>
> You can put the length() function around it to just get the number of
> lexemes. This is the size in terms of number of distinct lexemes, not size
> in terms of space utilization.
>
> select length(to_tsvector('the lord of the rings.txt')) "answer";
> answer
> --------
> 2
>
> You might find the tsvector data consumes 2x the space required by the
> input text. It will depend on your configuration and your input data. Test
> it and let us know what you find.
>
> -Mark
>
> -----Original Message-----
> *From:* Tim [mailto:elatllat(at)gmail(dot)com]
> *Sent:* Monday, June 13, 2011 03:19 PM
> *To:* pgsql-admin(at)postgresql(dot)org
> *Subject:* [ADMIN] tsvector limitations
>
> Dear list,
>
> How big of a file would one need to fill the 1MB limit of a tsvector?
> Reading
> http://www.postgresql.org/docs/9.0/static/textsearch-limitations.htmlseems to hint that filling a tsvector is improbable.
>
> Is there an easy way of query the bytes of a tsvector?
> something like length(tsvector) but bytes(tsvector).
>
> If there no easy method to query the bytes of a tsvector
> I realize the answer is highly dependent on the contents of the file, so
> here are 2 random examples:
> How many bytes of a tsvector would a 32MB ascii english unique word list
> make?
> How many bytes of a tsvector would something like "The Lord of the
> Rings.txt" make?
>
> If this limitation is ever hit is there a common practice for using more
> than one tsvector?
> Using a separate "one to many" table seems like an obvious solution piece,
> but I would not know how to detect or calculate how much text to give each
> tsvector.
> Assuming tsvectors can't be linked maybe they would need some overlap.
>
>
> Thanks in advance.
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2011-06-14 14:18:20 | Re: tsvector limitations |
Previous Message | Mark Johnson | 2011-06-14 01:43:37 | Re: tsvector limitations |