From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru> |
Subject: | Re: pglz performance |
Date: | 2019-05-13 07:14:27 |
Message-ID: | 20190513071427.GB2273@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote:
> I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there
> is a big room for improvement in performance of pglz compression and
> decompression.
Yes, I believe so too. pglz is a huge CPU-consumer when it comes to
compilation compared to more modern algos like lz4.
> With Vladimir we started to investigate ways to boost byte copying
> and eventually created test suit[1] to investigate performance of
> compression and decompression. This is and extension with single
> function test_pglz() which performs tests for different:
> 1. Data payloads
> 2. Compression implementations
> 3. Decompression implementations
Cool. I got something rather similar in my wallet of plugins:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test
This is something I worked on mainly for FPW compression in WAL.
> Currently we test mostly decompression improvements against two WALs
> and one data file taken from pgbench-generated database. Any
> suggestion on more relevant data payloads are very welcome.
Text strings made of random data and variable length? For any test of
this kind I think that it is good to focus on the performance of the
low-level calls, even going as far as a simple C wrapper on top of the
pglz APIs to test only the performance and not have extra PG-related
overhead like palloc() which can be a barrier. Focusing on strings of
lengths of 1kB up to 16kB may be an idea of size, and it is important
to keep the same uncompressed strings for performance comparison.
> My laptop tests show that our decompression implementation [2] can
> be from 15% to 50% faster. Also I've noted that compression is
> extremely slow, ~30 times slower than decompression. I believe we
> can do something about it.
That's nice.
> We focus only on boosting existing codec without any considerations
> of other compression algorithms.
There is this as well. A couple of algorithms have a license
compatible with Postgres, but it may be more simple to just improve
pglz. A 10%~20% improvement is something worth doing.
> Most important questions are:
> 1. What are relevant data sets?
> 2. What are relevant CPUs? I have only XEON-based servers and few
> laptops\desktops with intel CPUs
> 3. If compression is 30 times slower, should we better focus on
> compression instead of decompression?
Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked. Any improvement in one or the
other, or even both, is nice to have.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | 张连壮 | 2019-05-13 07:30:54 | pg_stat_database update stats_reset only by pg_stat_reset |
Previous Message | Oleg Bartunov | 2019-05-13 07:08:57 | Re: PG 12 draft release notes |