From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Zedstore - compressed in-core columnar storage |
Date: | 2019-04-09 16:54:03 |
Message-ID: | bd979ec6-f6cf-464c-0dc2-e1c8e9326bc7@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09.04.2019 19:19, Heikki Linnakangas wrote:
> On 09/04/2019 18:00, Konstantin Knizhnik wrote:
>> Looks like the original problem was caused by internal postgres
>> compressor: I have not configured Postgres to use lz4.
>> When I configured Postgres --with-lz4, data was correctly inserted in
>> zedstore table, but looks it is not compressed at all:
>>
>> postgres=# select pg_relation_size('zedstore_lineitem_projection');
>> pg_relation_size
>> ------------------
>> 9363010640
>>
>> No wonder that zedstore shows the worst results:
>>
>> lineitem 6240.261 ms
>> lineitem_projection 5390.446 ms
>> zedstore_lineitem_projection 23310.341 ms
>> vops_lineitem_projection 439.731 ms
>>
>> Updated version of vstore_bench.sql is attached (sorry, there was some
>> errors in previous version of this script).
>
> I tried this quickly, too. With default work_mem and no parallelism,
> and 1 gb table size, it seems that the query chooses a different plan
> with heap and zedstore, with a sort+group for zedstore and hash agg
> for heap. There's no ANALYZE support in zedstore yet, and we haven't
> given much thought to parallelism either. With work_mem='1GB' and no
> parallelism, both queries use a hash agg, and the numbers are much
> closer than what you saw, about 6 s for heap, and 9 s for zedstore.
>
> - Heikki
Yes, you was right. The plan for zedstore uses GroupAggregate instead
of HashAggregate.
Increasing work_mem force optimizer to use HashAggregate in all cases.
But it doesn't prevent memory overflow in my case.
And it is very strange to me, because there are just 4 groups in the
result, so it should not consume any memory.
Yet another strange thing is that size of zedstore_table is 10Gb
according to pg_relation_size.
Q1 query access only some some subset of "lineitem" columns, not
touching the largest ones (with text).
I have configured 12Gb shared buffers. And all this 11Gb are used! Looks
like all columns are fetched from the disk.
And looks like except this 11Gb of shard data, backend (and each
parallel worker) is also consuming several gigabytes of heap memory.
As a result total size of used memory during parallel query execution
with 4 workers exceeds 20GB and cause terrible swapping at my system.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2019-04-09 17:00:03 | Re: reloption to prevent VACUUM from truncating empty pages at the end of relation |
Previous Message | Amit Khandekar | 2019-04-09 16:53:16 | Re: Minimal logical decoding on standbys |