>ISTM we would be able to do this fairly well if we implemented >Index-only columns. i.e. columns that don't exist in the heap, only in >an index. >Taken to the extreme, all columns could be removed from the heap and >placed in an index(es). Only the visibility information would remain on >the heap. So, let me understand this correctly - you want to index the columns and use the index to reconstruct the data? Some kind of "implicit" reconstruction? >Doing this per column would be a big win over vertical databases >since AFAIK they *have* to do this to *every* column, even if it is not >beneficial to do so. <snip> I was thinking about something a little more crude - each column being a free-standing table, but being "viewed" by the client as a single entity, a kind of "data federation". The idea was that the existing storage mechanism wouldn't be altered, and with a little slight-of-hand, we could extend the mechanism without hampering things like clustering, future extensions, optimizations, etc. No changes to MVCC would be needed, because it would above and through it. The idea was that for a "wide" table you target only a few columns, so you could sequential read without the penalty of having to seek to a new offset for each record. Instead of processing all the columns, you process a single column, which means less data to read for the same number of records. That gain starts to slope off when you specify more and more columns from the table. Throw in selective indexing (similar to what you were talking about) and suddenly we can reclaim some of that lost speed. We'll make a kind of "compressed index", where the key turns into a hash that points to (is attached to?) a bucket, and the bucket contains the offset of all the records that relate to that key. Other tricks can be employed, such as run-length encoding entire ranges of offsets, etc. to keep this really really small. Really small = faster and faster to read, using more CPU than I/O. And I/O is still more of an issue than CPU at this point. Then again, if you ditch the column altogether and use a "compressed index" to reconstruct data implicitly, now we're close to what you were talking about (assuming I understand you correctly and also assuming that PostgreSQL doesn't already do this with indexes). So, if the column is indexed, then maybe split it off into a "compressed index", and if not, keep it in the main table outright? I guess I really need to think about this a bit more before I start delving into code. <Vertical DB market discussion - snipped> >I thought maybe we can call it COAST, Column-oriented attribute storage technique, :-) I like it. :-) I just wish I would have read this before applying for a project name at pgfoundry, the current proposal is given as "pg-cstore".