From: | Michael Heaney <mheaney(at)jcvi(dot)org> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Advice for using integer arrays? |
Date: | 2015-01-06 17:09:56 |
Message-ID: | 54AC16E4.9060709@jcvi.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I'm fairly new to Postgres, and have a design issue for which an array
of integers might be a good solution. But I'd like to hear from the
experts before proceeding down this path.
Essentially, I'm trying to model the relationship between a group of
biological samples and their genes. Each sample (there are ~10K of them
now, with more coming) will have about 30,000 genes. Conversely, a
particular gene may be present in almost all samples.
So I've created the following table to handle the many-to-many relationship:
create table sample_gene (id serial, sample_id int, gene_id int);
which looks like this when populated:
sample_id | gene_id
---------------------------------------
1 | 1
1 | 2
...
1 | 30475
2 | 1
2 | 2
...
2 | 29973
3 | 1
etc.
The table now contains hundreds of millions of rows (with many, many
more to come). Join performance between samples and genes is quite
slow, even with indexes on sample_id and gene_id.
So it occurred to me: why not eliminate all the duplicate sample_id
values by storing the gene_id's in an array, like so:
create table sample_gene_array (id serial, sample_id int, gene_id int
[] );
So now the table data looks like this:
sample_id | gene_id []
---------------------------------------
1 | [1:30475]
2 | [1:29973]
etc.
The new table is significantly smaller, and performance (using ANY[] )
is quite good. Nevertheless, I'm uneasy. I come from a Sybase ASE
background, and so have no experience with arrays as datatypes. Is it
okay to store 30K+ gene values in an array in the linking table (or
maybe even in the sample table itself, thus eliminating the linking
table)? Should I unnest the gene_id's first, before using them to join
to the gene table?
TIA for any guidance you can provide. Again, I'm a Postgres neophyte -
but I'm in awe of the power and flexibility of this database, and wish
that I'd started using it sooner.
------
Michael Heaney
JCVI
From | Date | Subject | |
---|---|---|---|
Next Message | Arthur Silva | 2015-01-06 17:48:00 | Re: Advice for using integer arrays? |
Previous Message | Jonathan Vanasco | 2015-01-06 17:07:00 | Re: postgresql versus riak for a global exchange |