Re: Zedstore - compressed in-core columnar storage

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-04-14 16:36:18
Message-ID: 20190414163618.x2bhtfwdulhah7o5@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 11, 2019 at 04:52:33PM +0300, Konstantin Knizhnik wrote:
> On 11.04.2019 16:18, Andreas Karlsson wrote:
>
> On 4/11/19 10:46 AM, Konstantin Knizhnik wrote:
>
> This my results of compressing pbench data using different
> compressors:
>
> +-------------------------------------------------------------+
> |Configuration |Size (Gb) |Time (sec) |
> |---------------------------+----------------+----------------|
> |no compression |15.31 |92 |
> |---------------------------+----------------+----------------|
> |zlib (default level) |2.37 |284 |
> |---------------------------+----------------+----------------|
> |zlib (best speed) |2.43 |191 |
> |---------------------------+----------------+----------------|
> |postgres internal lz |3.89 |214 |
> |---------------------------+----------------+----------------|
> |lz4 |4.12 |95 |
> |---------------------------+----------------+----------------|
> |snappy |5.18 |99 |
> |---------------------------+----------------+----------------|
> |lzfse |2.80 |1099 |
> |---------------------------+----------------+----------------|
> |(apple) 2.80 1099 |1.69 |125 |
> +-------------------------------------------------------------+
>
> You see that zstd provides almost 2 times better compression ration
> and almost at the same speed.
>
> What is "(apple) 2.80 1099"? Was that intended to be zstd?
>
> Andreas
>
> Ugh...
> Cut and paste problems.
> The whole document can be found here:
> http://garret.ru/PageLevelCompression.pdf
>
> lzfse (apple)       2.80    1099
> zstd (facebook)  1.69    125
>
> ztsd is compression algorithm proposed by facebook: 
> https://github.com/facebook/zstd
> Looks like it provides the best speed/compress ratio result.
>

I think those comparisons are cute and we did a fair amount of them when
considering a drop-in replacement for pglz, but ultimately it might be a
bit pointless because:

(a) it very much depends on the dataset (one algorithm may work great on
one type of data, suck on another)

(b) different systems may require different trade-offs (high ingestion
rate vs. best compression ratio)

(c) decompression speed may be much more important

What I'm trying to say is that we shouldn't obsess about picking one
particular algorithm too much, because it's entirely pointless. Instead,
we should probably design the system to support different compression
algorithms, ideally at column level.

Also, while these general purpose algorithms are nice, what I think will
be important in later stages of colstore development will be compression
algorithms allowing execution directly on the compressed data (like RLE,
dictionary and similar approaches).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-04-14 16:39:47 Re: Zedstore - compressed in-core columnar storage
Previous Message Tomas Vondra 2019-04-14 16:26:45 Re: Zedstore - compressed in-core columnar storage