From: | Jeff Gentry <jgentry(at)jimmy(dot)harvard(dot)edu> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Confronting the maximum column limitation |
Date: | 2008-08-12 20:15:20 |
Message-ID: | Pine.SOL.4.20.0808121606020.10620-100000@noah.dfci.harvard.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi there ...
I recently discovered that there is a hard cap on the # of columns, being
at 1600. I also understand that it is generally unfathomable that anyone
would ever feel limited by that number ... however I've managed to bump
into it myself and was looking to see if anyone had advice on how to
manage the situation.
As a bit of background, we have a Postgres database to manage information
revolving around genomic datasets, including the dataset itself. The
actual data is treated in other applications as a matrix, and while it has
caused the DB design to be sub-optimal the model worked to just stash the
entire matrix in the DB (the rest of the DB design is proper, but the
storage of these matrices straight up is unorthodox ... for the
convenience of having everything in the same storage unit with all of the
other information, it has been worth the extra headache and potential
performance dings).
In these matrices, columns represent biological samples, rows represent
fragments of the genome and the cells are populated with values. There
are a variety of row configurations (depending on what chip the samples
were handled on) which range in number from a few thousand to a few
hundred thousand (currently, it is constantly expanding upwards). The
real problem lies with the columns (biological samples) in that it is
rarely the case that we'll have multiple matrices with overlap in columns
- and even in the cases where that happens, it is almost never a good idea
to treat them as the same thing.
Mind you, this is a world where having a set with a few hundred samples is
still considered pretty grandiose - I just happened to have one of the
very few out there which would come anywhere close to breaking the 1600
barrier and it is unlikely to really be an issue for at least a few (if
not more) years ... but looking down the road it'd be better to nip this
in the bud now than punt it until it becomes a real issue.
So I've seen the header file where the 1600 column limit is defined, and
I know the arguments that no one should ever want to come anywhere close
to that limit. I'm willing to accept that these matrices could be stored
in some alternate configuration, although I don't really know what that
would be. It's possible that the right answer might be "pgsql just isn't
the right tool for this job" or even punting it for down the road might be
the correct choice. I was just hoping that some folks here might be able
to give their thoughts here.
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2008-08-12 20:26:44 | Re: 8.3.1 Vs 8.3.3 |
Previous Message | William Garrison | 2008-08-12 19:51:32 | Re: ftell error during pg_dump |