Quick Links

Re: Advice for using integer arrays?

From:	Michael Heaney <mheaney(at)jcvi(dot)org>
To:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Advice for using integer arrays?
Date:	2015-01-06 21:18:29
Message-ID:	54AC5125.70106@jcvi.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 1/6/2015 2:19 PM, Jeff Janes wrote:
> On Tue, Jan 6, 2015 at 9:09 AM, Michael Heaney <mheaney(at)jcvi(dot)org
> <mailto:mheaney(at)jcvi(dot)org>> wrote:
>
> I'm fairly new to Postgres, and have a design issue for which an
> array of integers might be a good solution. But I'd like to hear
> from the experts before proceeding down this path.
>
> Essentially, I'm trying to model the relationship between a group
> of biological samples and their genes. Each sample (there are ~10K
> of them now, with more coming) will have about 30,000 genes.
> Conversely, a particular gene may be present in almost all samples.
>
> So I've created the following table to handle the many-to-many
> relationship:
>
> create table sample_gene (id serial, sample_id int, gene_id int);
>
>
> What is the value of having this table at all? It doesn't seem to
> contain anything informative, like an allele identifier, a resequence,
> or a copy number variation. If you are just trying to record the fact
> that a gene was present in that sample, perhaps it would be better to
> instead record the genes have been deleted, rather than the ones that
> have not been deleted? That would probably be a much smaller list.
>

I suppose there could be a gene table which would contain data about
each gene_id.
But I'm an IT guy, not a biologist, and my sample_gene table doesn't
actually
exist. I'm more concerned with how to deal with many-to-many relationships
when each parent could have tens of thousands of children. Collapsing all
the children into an array for each parent looked intriguing - but maybe
it's
not a good idea. I just don't know, so I thought I'd ask you guys.

> create table sample_gene_array (id serial, sample_id int, gene_id
> int [] );
>
> So now the table data looks like this:
>
> sample_id | gene_id []
> ---------------------------------------
> 1 | [1:30475]
> 2 | [1:29973]
> etc.
>
>
> I'm not familiar with the square bracket and colon as a syntax for
> expressing int arrays. Are you taking liberties with the psql output,
> or using a different client program? Does that represent the range
> from 1 to 30475, or the two values 1 and 30475?

Yes, it's shorthand for the full range of values from 1 to 30475. Wasn't
sure how
to represent it, and almost went with (1,2,3...30475). Apologies...

Michael Heaney
JCVI

In response to

Re: Advice for using integer arrays? at 2015-01-06 19:19:56 from Jeff Janes

Responses

Re: Advice for using integer arrays? at 2015-01-06 22:24:59 from Paul Jungwirth
Re: Advice for using integer arrays? at 2015-01-07 00:19:23 from Adrian Klaver

Browse pgsql-general by date

	From	Date	Subject
Next Message	Kevin Grittner	2015-01-06 21:34:20	Re: How to monitor locks (max_pred_locks_per_transaction)?
Previous Message	Alban Hertroys	2015-01-06 20:15:25	Re: Correct/optimal DML query for application session management ?