From: | Michael Heaney <mheaney(at)jcvi(dot)org> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Advice for using integer arrays? |
Date: | 2015-01-06 21:18:29 |
Message-ID: | 54AC5125.70106@jcvi.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 1/6/2015 2:19 PM, Jeff Janes wrote:
> On Tue, Jan 6, 2015 at 9:09 AM, Michael Heaney <mheaney(at)jcvi(dot)org
> <mailto:mheaney(at)jcvi(dot)org>> wrote:
>
> I'm fairly new to Postgres, and have a design issue for which an
> array of integers might be a good solution. But I'd like to hear
> from the experts before proceeding down this path.
>
> Essentially, I'm trying to model the relationship between a group
> of biological samples and their genes. Each sample (there are ~10K
> of them now, with more coming) will have about 30,000 genes.
> Conversely, a particular gene may be present in almost all samples.
>
> So I've created the following table to handle the many-to-many
> relationship:
>
> create table sample_gene (id serial, sample_id int, gene_id int);
>
>
> What is the value of having this table at all? It doesn't seem to
> contain anything informative, like an allele identifier, a resequence,
> or a copy number variation. If you are just trying to record the fact
> that a gene was present in that sample, perhaps it would be better to
> instead record the genes have been deleted, rather than the ones that
> have not been deleted? That would probably be a much smaller list.
>
I suppose there could be a gene table which would contain data about
each gene_id.
But I'm an IT guy, not a biologist, and my sample_gene table doesn't
actually
exist. I'm more concerned with how to deal with many-to-many relationships
when each parent could have tens of thousands of children. Collapsing all
the children into an array for each parent looked intriguing - but maybe
it's
not a good idea. I just don't know, so I thought I'd ask you guys.
> create table sample_gene_array (id serial, sample_id int, gene_id
> int [] );
>
> So now the table data looks like this:
>
> sample_id | gene_id []
> ---------------------------------------
> 1 | [1:30475]
> 2 | [1:29973]
> etc.
>
>
> I'm not familiar with the square bracket and colon as a syntax for
> expressing int arrays. Are you taking liberties with the psql output,
> or using a different client program? Does that represent the range
> from 1 to 30475, or the two values 1 and 30475?
Yes, it's shorthand for the full range of values from 1 to 30475. Wasn't
sure how
to represent it, and almost went with (1,2,3...30475). Apologies...
Michael Heaney
JCVI
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2015-01-06 21:34:20 | Re: How to monitor locks (max_pred_locks_per_transaction)? |
Previous Message | Alban Hertroys | 2015-01-06 20:15:25 | Re: Correct/optimal DML query for application session management ? |