From: | Steve Atkins <steve(at)blighty(dot)com> |
---|---|
To: | pgsql-general General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: more than 2GB data string save |
Date: | 2010-02-10 07:30:40 |
Message-ID: | 74C956DB-7C49-498F-B19B-E796EB1876D0@blighty.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Feb 9, 2010, at 11:21 PM, Scott Marlowe wrote:
> On Wed, Feb 10, 2010 at 12:11 AM, Steve Atkins <steve(at)blighty(dot)com> wrote:
>> A database isn't really the right way to do full text search for single files that big. Even if they'd fit in the database it's way bigger than the underlying index types tsquery uses are designed for.
>>
>> Are you sure that the documents are that big? A single document of that size would be 400 times the size of the bible. That's a ridiculously large amount of text, most of a small library.
>>
>> If the answer is "yes, it's really that big and it's really text" then look at clucene or, better, hiring a specialist.
>
> I'm betting it's something like gene sequences or geological samples,
> or something other than straight text. But even those bear breaking
> down into some kind of simple normalization scheme don't they?
An entire human is a shade over 3 billion base pairs, with an information
content of well under a gigabyte.
The earth is about 4 billion years old, so if you were sampling every
couple of years and you have the perfect core sample... maybe.
I'm not sure that any form of full text search that includes stemming
would be terribly useful for either.
Cheers,
Steve
From | Date | Subject | |
---|---|---|---|
Next Message | Allan Kamau | 2010-02-10 07:34:03 | Re: more than 2GB data string save |
Previous Message | Pavel Stehule | 2010-02-10 07:29:36 | Re: dump of 700 GB database |