From: | Qingqing Zhou <zhouqq(dot)postgres(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: On columnar storage |
Date: | 2015-06-11 23:58:00 |
Message-ID: | CAJjS0u2Lh9ix9Ff7_gigXJEfC1+yPkoOdAbyzMFs+P3PQNiY+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 11, 2015 at 4:03 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres. Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance
>
And better compression ratio.
> We're not interested in perpetuating the idea that a CS needs to go
> through the FDW mechanism.
>
Agree. It is cleaner to add a ColumnScan node which does a scan
against a columnar table, and a possible ColumnIndexScan for an
indexed columnar table seek.
> Since we want to have pluggable implementations, we need to have a
> registry of store implementations.
>
If we do real native implementation, where columnar store sits on par
with heap, can give us arbitray flexibility to control performance and
transaction, without worrying about interface (you defined below)
compatibility.
> One critical detail is what will be used to identify a heap row when
> talking to a CS implementation. There are two main possibilities:
>
> 1. use CTIDs
> 2. use some logical tuple identifier
>
I like the concept of half row, half columnar table: this allows row
part good for select * and updates, and columnar part for other
purpose. Popular columnar-only table uses position alignment, which is
virtual (no storage), to associate each column value. CTIDs are still
needed but not for this purpose. An alternaive is:
1. Allow column groups, where several columns physically stored together;
2. Updates are handled by a separate row store table associated with
each columnar table.
> Query Processing
> ----------------
>
If we treat columnar storage as first class citizen as heap, we can
model after heap, which enables much natural change in parser,
rewriter, planner and executor.
Regards,
Qingqing
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2015-06-12 00:08:33 | Re: The purpose of the core team |
Previous Message | Tomas Vondra | 2015-06-11 23:29:21 | Re: DBT-3 with SF=20 got failed |