Quick Links

Re: Zedstore - compressed in-core columnar storage

From:	Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To:	"Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Zedstore - compressed in-core columnar storage
Date:	2019-07-01 19:08:06
Message-ID:	CALfoeiuzsqLo5v0KT-5tgqG6pdMNLznUuPeVUhcmaWUeWuFb4A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Jun 30, 2019 at 7:59 PM Tsunakawa, Takayuki <
tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:

> From: Ashwin Agrawal [mailto:aagrawal(at)pivotal(dot)io]
> > The objective is to gather feedback on design and approach to the same.
> > The implementation has core basic pieces working but not close to
> complete.
>
> Thank you for proposing a very interesting topic. Are you thinking of
> including this in PostgreSQL 13 if possible?
>
>
> > * All Indexes supported
> ...
> > work. Btree indexes can be created. Btree and bitmap index scans work.
>
> Does Zedstore allow to create indexes of existing types on the table
> (btree, GIN, BRIN, etc.) and perform index scans (point query, range query,
> etc.)?
>

Yes, all indexes types work for zedstore and allow point or range queries.

> > * Hybrid row-column store, where some columns are stored together, and
> > others separately. Provide flexibility of granularity on how to
> > divide the columns. Columns accessed together can be stored
> > together.
> ...
> > This way of laying out the data also easily allows for hybrid row-column
> > store, where some columns are stored together, and others have a
> dedicated
> > B-tree. Need to have user facing syntax to allow specifying how to group
> > the columns.
> ...
> > Zedstore Table can be
> > created using command:
> >
> > CREATE TABLE <name> (column listing) USING zedstore;
>
> Are you aiming to enable Zedstore to be used for HTAP, i.e. the same table
> can be accessed simultaneously for both OLTP and analytics with the minimal
> performance impact on OLTP? (I got that impression from the word "hybrid".)
>

Well "hybrid" is more to convey compressed row and column store can be
supported with same design. It really wasn't referring to HTAP. In general
the goal we are moving towards is column store to be extremely efficient at
analytics but still should be able to support all the OLTP operations (with
minimal performance or storage size impact) Like when making trade-offs
between different design choices and if both can't be meet, preference if
towards analytics.

If yes, is the assumption that only a limited number of columns are to be
> stored in columnar format (for efficient scanning), and many other columns
> are to be stored in row format for efficient tuple access?
>

Yes, like if its known that certain columns are always accessed together
better to store them together and avoid the tuple formation cost. Though
its still to be seen if compression plays role and storing each individual
column and compressing can still be winner compared to compressing
different columns as blob. Like saving on IO cost offsets out the tuple
formation cost or not.

Are those row-formatted columns stored in the same file as the
> column-formatted columns, or in a separate file?
>

Currently, we are focused to just get pure column store working and hence
not coded anything for hybrid layout yet. But at least right now the
thought is would be in same file.

Regarding the column grouping, can I imagine HBase and Cassandra?
> How could the current CREATE TABLE syntax support column grouping? (I
> guess CREATE TABLE needs a syntax for columnar store, and Zedstore need to
> be incorporated in core, not as an extension...)
>

When column grouping comes up yes will need to modify CREATE TABLE syntax,
we are still to reach that point in development.

> > A column store uses the same structure but we have *multiple* B-trees,
> one
> > for each column, all indexed by TID. The B-trees for all columns are
> stored
> > in the same physical file.
>
> Did you think that it's not a good idea to have a different file for each
> group of columns? Is that because we can't expect physical adjacency of
> data blocks on disk even if we separate a column in a separate file?
>
> I thought a separate file for each group of columns would be easier and
> less error-prone to implement and debug. Adding and dropping the column
> group would also be very easy and fast.
>

Currently, each group is a single column (till we don't have column
families) and having file for each column definitely seems not good idea.
As it just explodes the number of files. Separate file may have its
advantage from pre-fetching point of view but yes can't expect physical
adjacency of data blocks plus access pattern will anyways involve reading
multiple files (if each column stored in separate file).

I doubt storing each group makes it any easier to implement or debug, I
feel its actually reverse. Storing everything in single file but separate
blocks, keep the logic contained inside AM layer. And don't have to write
special code for example for drop table to delete files for all the groups
and all, or while moving table to different tablespace and all such
complication.

Adding and dropping column group, irrespective can be made easy and fast
with blocks for that group, added or marked for reuse within same file.

Thank you for the questions.

In response to

RE: Zedstore - compressed in-core columnar storage at 2019-07-01 02:59:17 from Tsunakawa, Takayuki

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2019-07-01 19:13:53	Memory-Bounded Hash Aggregation
Previous Message	Tom Lane	2019-07-01 18:44:45	Re: POC: converting Lists into arrays