From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: On columnar storage |
Date: | 2015-06-12 17:28:47 |
Message-ID: | 20150612172847.GN133018@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Amit Kapila wrote:
> On Fri, Jun 12, 2015 at 4:33 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:
> > There are several parts to this:
> >
> > 1. the CSM API
> > 2. Cataloguing column stores
> > 3. Query processing: rewriter, optimizer, executor
> >
>
> I think another important point is about the format of column stores, in
> Page format used by index/heap and how are they organised?
Not really. That stuff is part of the column store implementation
itself; we're not tackling that part just yet. Eventually there might
be an implementation using ORC or other formats. That doesn't matter at
this point -- we only need something that implements the specified API.
> > One critical detail is what will be used to identify a heap row when
> > talking to a CS implementation. There are two main possibilities:
> >
> > 1. use CTIDs
> > 2. use some logical tuple identifier
> >
> > Using CTIDs is simpler. One disadvantage is that every UPDATE of a row
> > needs to let the CS know about the new location of the tuple, so that
> > the value is known associated with the new tuple location as well as the
> > old. This needs to happen even if the value of the column itself is not
> > changed.
>
> Isn't this somewhat similar to index segment?
Not sure what you mean with "index segment". A column store is not an
index -- it is the primary storage for the column in question. The heap
does not have a copy of the data.
> Will the column store obey snapshot model similar to current heap tuples,
> if so will it derive the transaction information from heap tuple?
Yes, visibility will be tied to the heap tuple -- a value is accessed
only when its corresponding heap row has already been determined to be
visible. One interesting point that raises from this is about vacuum:
when are we able to remove a value from the store? I have some
not-completely-formed ideas about this.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2015-06-12 17:54:04 | Re: Entities created in one query not available in another in extended protocol |
Previous Message | Josh Berkus | 2015-06-12 17:06:55 | Re: Why does replication need the old history file? |