| From: | Michael Lush <mjlush(at)gmail(dot)com> |
|---|---|
| To: | pgsql-novice(at)postgresql(dot)org |
| Subject: | Big wide datasets |
| Date: | 2011-12-08 13:05:19 |
| Message-ID: | CACXX7MdoDdACfJMfhnugNoGxAhe-n5kxr716tGt6iUZ1n4ZKyQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-novice |
I have dataset with ~10000 columns and ~200000 rows (GWAS data (1)) in the
form
sample1, A T, A A, G C, ....
sampel2, A C, C T, A A, ....
I'd like to take subsets of both columns and rows for analysis
Two approaches spring to mind either unpack it into something like an RDF
triple
ie
CREATE TABLE long_table (
sample_id varchar(20),
column_number int,
snp_data varchar(3));
for a table with 20 billion rows
or use the array datatype
CREATE TABLE wide_table (
sample_id,
snp_data[]);
Does anyone have any experience of this sort of thing?
(1) http://en.wikipedia.org/wiki/Genome-wide_association_study
--
Michael Lush
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jean-Yves F. Barbier | 2011-12-08 13:24:44 | Re: Big wide datasets |
| Previous Message | Ioannis Anagnostopoulos | 2011-12-08 12:57:05 | What is faster? |