Re: Proposal: Adding compression of temporary files

From: Filip Janus <fjanus(at)redhat(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Adding compression of temporary files
Date: 2025-01-04 22:40:46
Message-ID: CAFjYY+KhiuzocMdgEr8wQtCkSaec=Mu_YiLE6N3JfYtQ-Xv=ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Even though i started with lz4, I added also pglz support and enhanced
memory management based on provided review.

-Filip-

čt 28. 11. 2024 v 12:32 odesílatel Filip Janus <fjanus(at)redhat(dot)com> napsal:

>
> I've added a regression test for lz4 compression if the server is compiled
> with the "--with-lz4" option.
>
> -Filip-
>
>
> ne 24. 11. 2024 v 15:53 odesílatel Filip Janus <fjanus(at)redhat(dot)com> napsal:
>
>>
>>
>> -Filip-
>>
>>
>> st 20. 11. 2024 v 1:35 odesílatel Tomas Vondra <tomas(at)vondra(dot)me> napsal:
>>
>>> Hi,
>>>
>>> On 11/18/24 22:58, Filip Janus wrote:
>>> > ...
>>> > Hi all,
>>> > Postgresql supports data compression nowadays, but the compression
>>> of
>>> > temporary files has not been implemented yet. The huge queries can
>>> > produce a significant amount of temporary data that needs to
>>> > be stored on disk
>>> > and cause many expensive I/O operations.
>>> > I am attaching a proposal of the patch to enable temporary files
>>> > compression for
>>> > hashjoins for now. Initially, I've chosen the LZ4 compression
>>> > algorithm. It would
>>> > probably make better sense to start with pglz, but I realized it
>>> late.
>>> >
>>>
>>> Thanks for the idea & patch. I agree this might be quite useful for
>>> workloads generating a lot of temporary files for stuff like sorts etc.
>>> I think it will be interesting to think about the trade offs, i.e. how
>>> to pick the compression level - at some point the compression ratio
>>> stops improving while paying more and more CPU time. Not sure what the
>>> right choice is, so using default seems fine.
>>>
>>> I agree it'd be better to start with pglz, and only then add lz4 etc.
>>> Firstly, pglz is simply the built-in compression, supported everywhere.
>>> And it's also simpler to implement, I think.
>>>
>>> > # Future possible improvements
>>> > Reducing the number of memory allocations within the dumping and
>>> > loading of
>>> > the buffer. I have two ideas for solving this problem. I would
>>> > either add a buffer into
>>> > struct BufFile or provide the buffer as an argument from the
>>> caller.
>>> > For the sequential
>>> > execution, I would prefer the second option.
>>> >
>>>
>>> Yes, this would be good. Doing a palloc+pfree for each compression is
>>> going to be expensive, especially because these buffers are going to be
>>> large - likely larger than 8kB. Which means it's not cached in the
>>> memory context, etc.
>>>
>>> Adding it to the BufFile is not going to fly, because that doubles the
>>> amount of memory per file. And we already have major issues with hash
>>> joins consuming massive amounts of memory. But at the same time the
>>> buffer is only needed during compression, and there's only one at a
>>> time. So I agree with passing a single buffer as an argument.
>>>
>>> > # Future plan/open questions
>>> > In the future, I would like to add support for pglz and zstd.
>>> > Further, I plan to
>>> > extend the support of the temporary file compression also for
>>> > sorting, gist index creation, etc.
>>> >
>>> > Experimenting with the stream mode of compression algorithms. The
>>> > compression
>>> > ratio of LZ4 in block mode seems to be satisfying, but the stream
>>> > mode could
>>> > produce a better ratio, but it would consume more memory due to the
>>> > requirement to store
>>> > context for LZ4 stream compression.
>>> >
>>>
>>> One thing I realized is that this only enables temp file compression for
>>> a single place - hash join spill files. AFAIK this is because compressed
>>> files don't support random access, and the other places might need that.
>>>
>>> Is that correct? The patch does not explain this anywhere. If that's
>>> correct, the patch probably should mention this in a comment for the
>>> 'compress' argument added to BufFileCreateTemp(), so that it's clear
>>> when it's legal to set compress=true.
>>>
>>
>> I will add the description there.
>>
>>
>>> Which other places might compress temp files? Surely hash joins are not
>>> the only place that could benefit from this, right?
>>>
>>
>> Yes, you are definitely right. I have chosen the hash joins as a POC
>> because
>> there are no seeks besides seeks at the beginning of the buffer.
>> I have focused on hashjoins, but there are definitely also other places
>> where
>> the compression could be used. I want to add support in other places
>> in the feature.
>>
>>
>>> Another thing is testing. If I run regression tests, it won't use
>>> compression at all, because the GUC has "none" by default, right? But we
>>> need some testing, so how would we do that? One option would be to add a
>>> regression test that explicitly sets the GUC and does a hash join, but
>>> that won't work with lz4 (because that may not be enabled).
>>
>>
>> Right, it's "none" by default. My opinion is that we would like to test
>> every supported compression method, so I will try to add environment
>> variable as
>> you recommended.
>>
>>
>>>
>>> Another option might be to add a PG_TEST_xxx environment variable that
>>> determines compression to use. Something like PG_TEST_USE_UNIX_SOCKETS.
>>> But perhaps there's a simpler way.
>>>
>>> > # Benchmark
>>> > I prepared three different databases to check expectations. Each
>>> > dataset is described below. My testing demonstrates that my patch
>>> > improves the execution time of huge hash joins.
>>> > Also, my implementation should not
>>> > negatively affect performance within smaller queries.
>>> > The usage of memory needed for temporary files was reduced in every
>>> > execution without a significant impact on execution time.
>>> >
>>> > *## Dataset A:*
>>> > Tables*
>>> > *
>>> > table_a(bigint id,text data_text,integer data_number) - 10000000
>>> rows
>>> > table_b(bigint id, integer ref_id, numeric data_value, bytea
>>> > data_blob) - 10000000 rows
>>> > Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
>>> > a.id> = b.id <http://b.id>;
>>> >
>>> > The tables contain highly compressible data.
>>> > The query demonstrated a reduction in the usage of the temporary
>>> > files ~20GB -> 3GB, based on this reduction also caused the
>>> execution
>>> > time of the query to be reduced by about ~10s.
>>> >
>>> >
>>> > *## Dataset B:*
>>> > Tables:*
>>> > *
>>> > table_a(integer id, text data_blob) - 1110000 rows
>>> > table_b(integer id, text data_blob) - 10000000 rows
>>> > Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
>>> > a.id> = b.id <http://b.id>;
>>> >
>>> > The tables contain less compressible data. data_blob was generated
>>> > by a pseudo-random generator.
>>> > In this case, the data reduction was only ~50%. Also, the execution
>>> > time was reduced
>>> > only slightly with the enabled compression.
>>> >
>>> > The second scenario demonstrates no overhead in the case of
>>> enabled
>>> > compression and extended work_mem to avoid temp file usage.
>>> >
>>> > *## Dataset C:*
>>> > Tables
>>> > customers (integer,text,text,text,text)
>>> > order_items(integer,integer,integer,integer,numeric(10,2))
>>> > orders(integer,integer,timestamp,numeric(10,2))
>>> > products(integer,text,text,numeric(10,2),integer)
>>> >
>>> > Query: SELECT p.product_id, p.name <http://p.name>, p.price,
>>> > SUM(oi.quantity) AS total_quantity, AVG(oi.price) AS avg_item_price
>>> > FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
>>> > oi.product_id JOIN
>>> > eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
>>> > '2020-01-01' AND p.price > 50
>>> > GROUP BY p.product_id, p.name <http://p.name>, p.price HAVING
>>> > SUM(oi.quantity) > 1000
>>> > ORDER BY total_quantity DESC LIMIT 100;
>>> >
>>> > This scenario should demonstrate a more realistic usage of the
>>> database.
>>> > Enabled compression slightly reduced the temporary memory usage,
>>> but
>>> > the execution
>>> > time wasn't affected by compression.
>>> >
>>> >
>>> > +------------+-------------------------+-----------------------
>>> > +------------------------------+
>>> > | Dataset | Compression. | temp_bytes | Execution
>>> > Time (ms) |
>>> > +------------+-------------------------+-----------------------
>>> > +----------------------------- +
>>> > | A | Yes | 3.09 GiB
>>> > | 22s586ms | work_mem = 4MB
>>> > | | No | 21.89 GiB
>>> > | 35s | work_mem = 4MB
>>> > +------------+-------------------------+-----------------------
>>> > +----------------------------------------
>>> > | B | Yes | 333 MB
>>>
>>> > | 1815.545 ms | work_mem = 4MB
>>> > | | No | 146 MB
>>> > | 1500.460 ms | work_mem = 4MB
>>> > | | Yes | 0 MB
>>> > | 3262.305 ms | work_mem = 80MB
>>> > | | No | 0 MB
>>>
>>> > | 3174.725 ms | work_mem = 80MB
>>> > +-------------+------------------------+------------------------
>>> > +-------------------------------------
>>> > | C | Yes | 40
>>> MB
>>> > | 1011.020 ms | work_mem = 1MB
>>> > | | No | 53
>>> > MB | 1034.142 ms | work_mem = 1MB
>>> > +------------+------------------------+------------------------
>>> > +--------------------------------------
>>> >
>>> >
>>>
>>> Thanks. I'll try to do some benchmarks on my own.
>>>
>>> Are these results fro ma single run, or an average of multiple runs?
>>
>>
>> It is average from multiple runs.
>>
>> Do
>>> you maybe have a script to reproduce this, including the data generation?
>>
>>
>> I am attaching my SQL file for database preparation. I also did further
>> testing
>> with two other machines( see attachment huge_tables.rtf ).
>>
>>>
>>> Also, can you share some information about the machine used for this? I
>>> expect the impact to strongly depends on memory pressure - if the temp
>>> file fits into page cache (and stays there), it may not benefit from the
>>> compression, right?
>>>
>>
>> If it fits into the page cache due to compression, I would consider it as
>> a benefit from compression.
>> I performed further testing on machines with different memory sizes.
>> Both experiments showed that compression was beneficial for execution
>> time.
>> The execution time reduction was more significant in the case of the
>> machine that had
>> less memory available.
>>
>> Tests were performed on:
>> MacBook PRO M3 36GB - MacOs
>> Virtual machine ARM64 10GB/ 6CPU - Fedora 39
>>
>>
>>>
>>> regards
>>>
>>> --
>>> Tomas Vondra
>>>
>>>

Attachment Content-Type Size
0001-This-commit-adds-support-for-temporary-files-compres.patch application/octet-stream 77.6 KB
0002-This-commit-enhance-temporary-file-compression.patch application/octet-stream 7.8 KB
0003-Add-test-for-pglz-compression-of-temporary-files.patch application/octet-stream 63.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2025-01-04 23:29:51 Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)
Previous Message Paul Jungwirth 2025-01-04 21:39:26 Re: SQL:2011 application time