Big wide datasets

From: Michael Lush <mjlush(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Big wide datasets
Date: 2011-12-08 13:05:19
Message-ID: CACXX7MdoDdACfJMfhnugNoGxAhe-n5kxr716tGt6iUZ1n4ZKyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I have dataset with ~10000 columns and ~200000 rows (GWAS data (1)) in the
form

sample1, A T, A A, G C, ....
sampel2, A C, C T, A A, ....

I'd like to take subsets of both columns and rows for analysis

Two approaches spring to mind either unpack it into something like an RDF
triple

ie
CREATE TABLE long_table (
sample_id varchar(20),
column_number int,
snp_data varchar(3));

for a table with 20 billion rows

or use the array datatype

CREATE TABLE wide_table (
sample_id,
snp_data[]);

Does anyone have any experience of this sort of thing?

(1) http://en.wikipedia.org/wiki/Genome-wide_association_study

--
Michael Lush

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Jean-Yves F. Barbier 2011-12-08 13:24:44 Re: Big wide datasets
Previous Message Ioannis Anagnostopoulos 2011-12-08 12:57:05 What is faster?