From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Alexandra Wang <lewang(at)pivotal(dot)io>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, DEV_OPS <devops(at)ww-it(dot)cn>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Zedstore - compressed in-core columnar storage |
Date: | 2020-11-16 16:51:40 |
Message-ID: | CAHyXU0yvpbTCqYS-ebKO6+u-5KWKYfHoh9jwaq08aS7Va-_Rjg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 16, 2020 at 10:07 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
> On 11/16/20 1:59 PM, Merlin Moncure wrote:
> > On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >> master zedstore/pglz zedstore/lz4
> >> -------------------------------------------------
> >> copy 1855 68092 2131
> >> dump 751 905 811
> >>
> >> And the size of the lineitem table (as shown by \d+) is:
> >>
> >> master: 64GB
> >> zedstore/pglz: 51GB
> >> zedstore/lz4: 20GB
> >>
> >> It's mostly expected lz4 beats pglz in performance and compression
> >> ratio, but this seems a bit too extreme I guess. Per past benchmarks
> >> (e.g. [1] and [2]) the difference in compression/decompression time
> >> should be maybe 1-2x or something like that, not 35x like here.
> >
> > I can't speak to the ratio, but in basic backup/restore scenarios pglz
> > is absolutely killing me; Performance is just awful; we are cpubound
> > in backups throughout the department. Installations defaulting to
> > plgz will make this feature show very poorly.
> >
>
> Maybe. I'm not disputing that pglz is considerably slower than lz4, but
> judging by previous benchmarks I'd expect the compression to be slower
> maybe by a factor of ~2x. So the 30x difference is suspicious. Similarly
> for the compression ratio - lz4 is great, but it seems strange it's 1/2
> the size of pglz. Which is why I'm speculating that something else is
> going on.
>
> As for the "plgz will make this feature show very poorly" I think that
> depends. I think we may end up with pglz doing pretty well (compared to
> heap), but lz4 will probably outperform that. OTOH for various use cases
> it may be more efficient to use something else with worse compression
> ratio, but allowing execution on compressed data, etc.
hm, you might be right. Doing some number crunching, I'm getting
about 23mb/sec compression on a 600gb backup image on a pretty typical
aws server. That's obviously not great, but your numbers are much
worse than that, so maybe something else might be going on.
> I think we may end up with pglz doing pretty well (compared to heap)
I *don't* think so, or at least I'm skeptical as long as insertion
times are part of the overall performance measurement. Naturally,
with column stores, insertion times are often very peripheral to the
overall performance picture but for cases that aren't I suspect the
results are not going to be pleasant, and advise planning accordingly.
Aside, I am very interested in this work. I may be able to support
testing in an enterprise environment; lmk if interested -- thank you
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2020-11-16 17:24:41 | Re: planner support functions: handle GROUP BY estimates ? |
Previous Message | Dave Page | 2020-11-16 16:49:12 | Re: Heads-up: macOS Big Sur upgrade breaks EDB PostgreSQL installations |