From: | Filip Janus <fjanus(at)redhat(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposal: Adding compression of temporary files |
Date: | 2025-01-04 23:43:05 |
Message-ID: | CAFjYY++YjtunZcFt7MFBWwseP_=GS11=V36_xig07enHA0eRjA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I apologize for multiple messages, but I found a small bug in the previous
version.
-Filip-
so 4. 1. 2025 v 23:40 odesílatel Filip Janus <fjanus(at)redhat(dot)com> napsal:
> Even though i started with lz4, I added also pglz support and enhanced
> memory management based on provided review.
>
>
>
> -Filip-
>
>
> čt 28. 11. 2024 v 12:32 odesílatel Filip Janus <fjanus(at)redhat(dot)com> napsal:
>
>>
>> I've added a regression test for lz4 compression if the server is
>> compiled with the "--with-lz4" option.
>>
>> -Filip-
>>
>>
>> ne 24. 11. 2024 v 15:53 odesílatel Filip Janus <fjanus(at)redhat(dot)com>
>> napsal:
>>
>>>
>>>
>>> -Filip-
>>>
>>>
>>> st 20. 11. 2024 v 1:35 odesílatel Tomas Vondra <tomas(at)vondra(dot)me> napsal:
>>>
>>>> Hi,
>>>>
>>>> On 11/18/24 22:58, Filip Janus wrote:
>>>> > ...
>>>> > Hi all,
>>>> > Postgresql supports data compression nowadays, but the
>>>> compression of
>>>> > temporary files has not been implemented yet. The huge queries
>>>> can
>>>> > produce a significant amount of temporary data that needs to
>>>> > be stored on disk
>>>> > and cause many expensive I/O operations.
>>>> > I am attaching a proposal of the patch to enable temporary files
>>>> > compression for
>>>> > hashjoins for now. Initially, I've chosen the LZ4 compression
>>>> > algorithm. It would
>>>> > probably make better sense to start with pglz, but I realized it
>>>> late.
>>>> >
>>>>
>>>> Thanks for the idea & patch. I agree this might be quite useful for
>>>> workloads generating a lot of temporary files for stuff like sorts etc.
>>>> I think it will be interesting to think about the trade offs, i.e. how
>>>> to pick the compression level - at some point the compression ratio
>>>> stops improving while paying more and more CPU time. Not sure what the
>>>> right choice is, so using default seems fine.
>>>>
>>>> I agree it'd be better to start with pglz, and only then add lz4 etc.
>>>> Firstly, pglz is simply the built-in compression, supported everywhere.
>>>> And it's also simpler to implement, I think.
>>>>
>>>> > # Future possible improvements
>>>> > Reducing the number of memory allocations within the dumping and
>>>> > loading of
>>>> > the buffer. I have two ideas for solving this problem. I would
>>>> > either add a buffer into
>>>> > struct BufFile or provide the buffer as an argument from the
>>>> caller.
>>>> > For the sequential
>>>> > execution, I would prefer the second option.
>>>> >
>>>>
>>>> Yes, this would be good. Doing a palloc+pfree for each compression is
>>>> going to be expensive, especially because these buffers are going to be
>>>> large - likely larger than 8kB. Which means it's not cached in the
>>>> memory context, etc.
>>>>
>>>> Adding it to the BufFile is not going to fly, because that doubles the
>>>> amount of memory per file. And we already have major issues with hash
>>>> joins consuming massive amounts of memory. But at the same time the
>>>> buffer is only needed during compression, and there's only one at a
>>>> time. So I agree with passing a single buffer as an argument.
>>>>
>>>> > # Future plan/open questions
>>>> > In the future, I would like to add support for pglz and zstd.
>>>> > Further, I plan to
>>>> > extend the support of the temporary file compression also for
>>>> > sorting, gist index creation, etc.
>>>> >
>>>> > Experimenting with the stream mode of compression algorithms. The
>>>> > compression
>>>> > ratio of LZ4 in block mode seems to be satisfying, but the stream
>>>> > mode could
>>>> > produce a better ratio, but it would consume more memory due to
>>>> the
>>>> > requirement to store
>>>> > context for LZ4 stream compression.
>>>> >
>>>>
>>>> One thing I realized is that this only enables temp file compression for
>>>> a single place - hash join spill files. AFAIK this is because compressed
>>>> files don't support random access, and the other places might need that.
>>>>
>>>> Is that correct? The patch does not explain this anywhere. If that's
>>>> correct, the patch probably should mention this in a comment for the
>>>> 'compress' argument added to BufFileCreateTemp(), so that it's clear
>>>> when it's legal to set compress=true.
>>>>
>>>
>>> I will add the description there.
>>>
>>>
>>>> Which other places might compress temp files? Surely hash joins are not
>>>> the only place that could benefit from this, right?
>>>>
>>>
>>> Yes, you are definitely right. I have chosen the hash joins as a POC
>>> because
>>> there are no seeks besides seeks at the beginning of the buffer.
>>> I have focused on hashjoins, but there are definitely also other places
>>> where
>>> the compression could be used. I want to add support in other places
>>> in the feature.
>>>
>>>
>>>> Another thing is testing. If I run regression tests, it won't use
>>>> compression at all, because the GUC has "none" by default, right? But we
>>>> need some testing, so how would we do that? One option would be to add a
>>>> regression test that explicitly sets the GUC and does a hash join, but
>>>> that won't work with lz4 (because that may not be enabled).
>>>
>>>
>>> Right, it's "none" by default. My opinion is that we would like to test
>>> every supported compression method, so I will try to add environment
>>> variable as
>>> you recommended.
>>>
>>>
>>>>
>>>> Another option might be to add a PG_TEST_xxx environment variable that
>>>> determines compression to use. Something like PG_TEST_USE_UNIX_SOCKETS.
>>>> But perhaps there's a simpler way.
>>>>
>>>> > # Benchmark
>>>> > I prepared three different databases to check expectations. Each
>>>> > dataset is described below. My testing demonstrates that my patch
>>>> > improves the execution time of huge hash joins.
>>>> > Also, my implementation should not
>>>> > negatively affect performance within smaller queries.
>>>> > The usage of memory needed for temporary files was reduced in
>>>> every
>>>> > execution without a significant impact on execution time.
>>>> >
>>>> > *## Dataset A:*
>>>> > Tables*
>>>> > *
>>>> > table_a(bigint id,text data_text,integer data_number) - 10000000
>>>> rows
>>>> > table_b(bigint id, integer ref_id, numeric data_value, bytea
>>>> > data_blob) - 10000000 rows
>>>> > Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
>>>> > a.id> = b.id <http://b.id>;
>>>> >
>>>> > The tables contain highly compressible data.
>>>> > The query demonstrated a reduction in the usage of the temporary
>>>> > files ~20GB -> 3GB, based on this reduction also caused the
>>>> execution
>>>> > time of the query to be reduced by about ~10s.
>>>> >
>>>> >
>>>> > *## Dataset B:*
>>>> > Tables:*
>>>> > *
>>>> > table_a(integer id, text data_blob) - 1110000 rows
>>>> > table_b(integer id, text data_blob) - 10000000 rows
>>>> > Query: SELECT * FROM table_a a JOIN table_b b ON a.id <http://
>>>> > a.id> = b.id <http://b.id>;
>>>> >
>>>> > The tables contain less compressible data. data_blob was generated
>>>> > by a pseudo-random generator.
>>>> > In this case, the data reduction was only ~50%. Also, the
>>>> execution
>>>> > time was reduced
>>>> > only slightly with the enabled compression.
>>>> >
>>>> > The second scenario demonstrates no overhead in the case of
>>>> enabled
>>>> > compression and extended work_mem to avoid temp file usage.
>>>> >
>>>> > *## Dataset C:*
>>>> > Tables
>>>> > customers (integer,text,text,text,text)
>>>> > order_items(integer,integer,integer,integer,numeric(10,2))
>>>> > orders(integer,integer,timestamp,numeric(10,2))
>>>> > products(integer,text,text,numeric(10,2),integer)
>>>> >
>>>> > Query: SELECT p.product_id, p.name <http://p.name>, p.price,
>>>> > SUM(oi.quantity) AS total_quantity, AVG(oi.price) AS
>>>> avg_item_price
>>>> > FROM eshop.products p JOIN eshop.order_items oi ON p.product_id =
>>>> > oi.product_id JOIN
>>>> > eshop.orders o ON oi.order_id = o.order_id WHERE o.order_date >
>>>> > '2020-01-01' AND p.price > 50
>>>> > GROUP BY p.product_id, p.name <http://p.name>, p.price HAVING
>>>> > SUM(oi.quantity) > 1000
>>>> > ORDER BY total_quantity DESC LIMIT 100;
>>>> >
>>>> > This scenario should demonstrate a more realistic usage of the
>>>> database.
>>>> > Enabled compression slightly reduced the temporary memory usage,
>>>> but
>>>> > the execution
>>>> > time wasn't affected by compression.
>>>> >
>>>> >
>>>> > +------------+-------------------------+-----------------------
>>>> > +------------------------------+
>>>> > | Dataset | Compression. | temp_bytes | Execution
>>>> > Time (ms) |
>>>> > +------------+-------------------------+-----------------------
>>>> > +----------------------------- +
>>>> > | A | Yes | 3.09 GiB
>>>>
>>>> > | 22s586ms | work_mem = 4MB
>>>> > | | No | 21.89 GiB
>>>>
>>>> > | 35s | work_mem = 4MB
>>>> > +------------+-------------------------+-----------------------
>>>> > +----------------------------------------
>>>> > | B | Yes | 333 MB
>>>>
>>>> > | 1815.545 ms | work_mem = 4MB
>>>> > | | No | 146 MB
>>>>
>>>> > | 1500.460 ms | work_mem = 4MB
>>>> > | | Yes | 0 MB
>>>>
>>>> > | 3262.305 ms | work_mem = 80MB
>>>> > | | No | 0 MB
>>>>
>>>> > | 3174.725 ms | work_mem = 80MB
>>>> > +-------------+------------------------+------------------------
>>>> > +-------------------------------------
>>>> > | C | Yes | 40
>>>> MB
>>>> > | 1011.020 ms | work_mem = 1MB
>>>> > | | No | 53
>>>> > MB | 1034.142 ms | work_mem = 1MB
>>>> > +------------+------------------------+------------------------
>>>> > +--------------------------------------
>>>> >
>>>> >
>>>>
>>>> Thanks. I'll try to do some benchmarks on my own.
>>>>
>>>> Are these results fro ma single run, or an average of multiple runs?
>>>
>>>
>>> It is average from multiple runs.
>>>
>>> Do
>>>> you maybe have a script to reproduce this, including the data
>>>> generation?
>>>
>>>
>>> I am attaching my SQL file for database preparation. I also did further
>>> testing
>>> with two other machines( see attachment huge_tables.rtf ).
>>>
>>>>
>>>> Also, can you share some information about the machine used for this? I
>>>> expect the impact to strongly depends on memory pressure - if the temp
>>>> file fits into page cache (and stays there), it may not benefit from the
>>>> compression, right?
>>>>
>>>
>>> If it fits into the page cache due to compression, I would consider it
>>> as a benefit from compression.
>>> I performed further testing on machines with different memory sizes.
>>> Both experiments showed that compression was beneficial for execution
>>> time.
>>> The execution time reduction was more significant in the case of the
>>> machine that had
>>> less memory available.
>>>
>>> Tests were performed on:
>>> MacBook PRO M3 36GB - MacOs
>>> Virtual machine ARM64 10GB/ 6CPU - Fedora 39
>>>
>>>
>>>>
>>>> regards
>>>>
>>>> --
>>>> Tomas Vondra
>>>>
>>>>
Attachment | Content-Type | Size |
---|---|---|
0001-This-commit-adds-support-for-temporary-files-compres.patch | application/octet-stream | 77.6 KB |
0002-This-commit-enhance-temporary-file-compression.patch | application/octet-stream | 8.2 KB |
0003-Add-test-for-pglz-compression-of-temporary-files.patch | application/octet-stream | 63.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2025-01-05 01:51:29 | Re: Proposal: add new API to stringinfo |
Previous Message | Ranier Vilela | 2025-01-04 23:29:51 | Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c) |