From: | Michael Lush <mjlush(at)gmail(dot)com> |
---|---|
To: | pgsql-novice(at)postgresql(dot)org |
Subject: | Big wide datasets |
Date: | 2011-12-08 13:05:19 |
Message-ID: | CACXX7MdoDdACfJMfhnugNoGxAhe-n5kxr716tGt6iUZ1n4ZKyQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
I have dataset with ~10000 columns and ~200000 rows (GWAS data (1)) in the
form
sample1, A T, A A, G C, ....
sampel2, A C, C T, A A, ....
I'd like to take subsets of both columns and rows for analysis
Two approaches spring to mind either unpack it into something like an RDF
triple
ie
CREATE TABLE long_table (
sample_id varchar(20),
column_number int,
snp_data varchar(3));
for a table with 20 billion rows
or use the array datatype
CREATE TABLE wide_table (
sample_id,
snp_data[]);
Does anyone have any experience of this sort of thing?
(1) http://en.wikipedia.org/wiki/Genome-wide_association_study
--
Michael Lush
From | Date | Subject | |
---|---|---|---|
Next Message | Jean-Yves F. Barbier | 2011-12-08 13:24:44 | Re: Big wide datasets |
Previous Message | Ioannis Anagnostopoulos | 2011-12-08 12:57:05 | What is faster? |