Re: Zedstore - compressed in-core columnar storage

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-04-15 20:17:09
Message-ID: 20190415201709.iuekkfen4df54pbg@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 15, 2019 at 11:57:49AM -0700, Ashwin Agrawal wrote:
> On Mon, Apr 15, 2019 at 11:18 AM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Maybe. I'm not going to pretend I fully understand the internals. Does
> that mean the container contains ZSUncompressedBtreeItem as elements? Or
> just the plain Datum values?
>
> First, your reading of code and all the comments/questions so far have
> been highly encouraging. Thanks a lot for the same.

;-)

> Container contains ZSUncompressedBtreeItem as elements. As for Item will
> have to store meta-data like size, undo and such info. We don't wish to
> restrict compressing only items from same insertion sessions only. Hence,
> yes doens't just store Datum values. Wish to consider it more tuple level
> operations and have meta-data for it and able to work with tuple level
> granularity than block level.

OK, thanks for the clarification, that somewhat explains my confusion.
So if I understand it correctly, ZSCompressedBtreeItem is essentially a
sequence of ZSUncompressedBtreeItem(s) stored one after another, along
with some additional top-level metadata.

> Definitely many more tricks can be and need to be applied to optimize
> storage format, like for fixed width columns no need to store the size in
> every item. Keep it simple is theme have been trying to maintain.
> Compression ideally should compress duplicate data pretty easily and
> efficiently as well, but we will try to optimize as much we can without
> the same.

I think there's plenty of room for improvement. The main problem I see
is that it mixes different types of data, which is bad for compression
and vectorized execution. I think we'll end up with a very different
representation of the container, essentially decomposing the items into
arrays of values of the same type - array of TIDs, array of undo
pointers, buffer of serialized values, etc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-04-15 20:31:14 Re: finding changed blocks using WAL scanning
Previous Message Peter Geoghegan 2019-04-15 20:07:38 Re: Zedstore - compressed in-core columnar storage