Re: Advice for using integer arrays?

From: Michael Heaney <mheaney(at)jcvi(dot)org>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Advice for using integer arrays?
Date: 2015-01-06 21:18:29
Message-ID: 54AC5125.70106@jcvi.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1/6/2015 2:19 PM, Jeff Janes wrote:
> On Tue, Jan 6, 2015 at 9:09 AM, Michael Heaney <mheaney(at)jcvi(dot)org
> <mailto:mheaney(at)jcvi(dot)org>> wrote:
>
> I'm fairly new to Postgres, and have a design issue for which an
> array of integers might be a good solution. But I'd like to hear
> from the experts before proceeding down this path.
>
> Essentially, I'm trying to model the relationship between a group
> of biological samples and their genes. Each sample (there are ~10K
> of them now, with more coming) will have about 30,000 genes.
> Conversely, a particular gene may be present in almost all samples.
>
> So I've created the following table to handle the many-to-many
> relationship:
>
> create table sample_gene (id serial, sample_id int, gene_id int);
>
>
> What is the value of having this table at all? It doesn't seem to
> contain anything informative, like an allele identifier, a resequence,
> or a copy number variation. If you are just trying to record the fact
> that a gene was present in that sample, perhaps it would be better to
> instead record the genes have been deleted, rather than the ones that
> have not been deleted? That would probably be a much smaller list.
>

I suppose there could be a gene table which would contain data about
each gene_id.
But I'm an IT guy, not a biologist, and my sample_gene table doesn't
actually
exist. I'm more concerned with how to deal with many-to-many relationships
when each parent could have tens of thousands of children. Collapsing all
the children into an array for each parent looked intriguing - but maybe
it's
not a good idea. I just don't know, so I thought I'd ask you guys.

> create table sample_gene_array (id serial, sample_id int, gene_id
> int [] );
>
> So now the table data looks like this:
>
> sample_id | gene_id []
> ---------------------------------------
> 1 | [1:30475]
> 2 | [1:29973]
> etc.
>
>
> I'm not familiar with the square bracket and colon as a syntax for
> expressing int arrays. Are you taking liberties with the psql output,
> or using a different client program? Does that represent the range
> from 1 to 30475, or the two values 1 and 30475?

Yes, it's shorthand for the full range of values from 1 to 30475. Wasn't
sure how
to represent it, and almost went with (1,2,3...30475). Apologies...

Michael Heaney
JCVI

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2015-01-06 21:34:20 Re: How to monitor locks (max_pred_locks_per_transaction)?
Previous Message Alban Hertroys 2015-01-06 20:15:25 Re: Correct/optimal DML query for application session management ?