I have a dataset with ~100,000 columns and ~200,000 rows (GWAS study
data(1))
none of the fields have more than about 15 characters
Sample1,A
I want to be able to extract subsets of columns and rows for analysis
I can see two ways I can approach this:-
Convert it to something RDF like ie
sample_id
column_number
data
to make a table with 3 columns and 20 billion rows
or use the array datatype
sample_id
array_of_data
(1) http://en.wikipedia.org/wiki/Genome-wide_association_study
--
Michael