Re: TEXT column > 1Gb

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Joe Carlson <jwcarlson(at)lbl(dot)gov>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: TEXT column > 1Gb
Date: 2023-04-12 21:29:34
Message-ID: 4e976074-5708-01a8-4cbf-f317e4f424b0@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 4/12/23 15:03, Joe Carlson wrote:
>
>
>> On Apr 12, 2023, at 12:21 PM, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
>>
>> On 4/12/23 13:02, Ron wrote:
>>> /Must/ the genome all be in one big file, or can you store them one
>>> line per table row?
>
> The assumption in the schema I’m using is 1 chromosome per record.
> Chromosomes are typically strings of continuous sequence (A, C, G, or
> T) separated by gaps (N) of approximately known, or completely unknown
> size. In the past this has not been a problem since sequenced
> chromosomes were maybe 100 megabases. But sequencing is better now
> with the technology improvements and tackling more complex genomes. So
> gigabase chromosomes are common.
>
> A typical use case might be from someone interested in seeing if they
> can identify the regulatory elements (the on or off switches) of a
> gene. The protein coding part of a gene can be predicted pretty
> reliably, but the upstream untranslated region and regulatory elements
> are tougher. So they might come to our web site and want to extract
> the 5 kb bit of sequence before the start of the gene and look for
> some of the common motifs that signify a protein binding site. Being
> able to quickly pull out a substring of the genome to drive a web app
> is something we want to do quickly.
>>

Well if you're actually using the sequence, both text and bytea are
inherently substring friendly.  Your problem goes back to transferring
large strings and that's where http/tomcat is you friend.  Sounds like
you're web friendly already.  You have to stream from the
client/supplier, of course.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kirk Wolak 2023-04-12 21:35:20 Re: Guidance on INSERT RETURNING order
Previous Message Joe Carlson 2023-04-12 21:03:36 Re: TEXT column > 1Gb