From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Binguo Bao <djydewang(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize partial TOAST decompression |
Date: | 2019-10-01 10:08:05 |
Message-ID: | 20191001100805.wnnuj73kuzfwzs56@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 01, 2019 at 11:20:39AM +0500, Andrey Borodin wrote:
>
>
>> 30 сент. 2019 г., в 22:29, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>>
>> On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:
>>>
>>>
>>>> 30 сент. 2019 г., в 20:56, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>>>>
>>>> I mean this:
>>>>
>>>> /*
>>>> * Use int64 to prevent overflow during calculation.
>>>> */
>>>> compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;
>>>>
>>>> I'm not very familiar with pglz internals, but I'm a bit puzzled by
>>>> this. My first instinct was to compare it to this:
>>>>
>>>> #define PGLZ_MAX_OUTPUT(_dlen) ((_dlen) + 4)
>>>>
>>>> but clearly that's a very different (much simpler) formula. So why
>>>> shouldn't pglz_maximum_compressed_size simply use this macro?
>>
>>>
>>> compressed_size accounts for possible increase of size during
>>> compression. pglz can consume up to 1 control byte for each 8 bytes of
>>> data in worst case.
>>
>> OK, but does that actually translate in to the formula? We essentially
>> need to count 8-byte chunks in raw data, and multiply that by 9. Which
>> gives us something like
>>
>> nchunks = ((rawsize + 7) / 8) * 9;
>>
>> which is not quite what the patch does.
>
>I'm afraid neither formula is correct, but all this is hair-splitting differences.
>
Sure. I just want to be sure the formula is safe and we won't end up
using too low value in some corner case.
>Your formula does not account for the fact that we may not need all bytes from last chunk.
>Consider desired decompressed size of 3 bytes. We may need 1 control byte and 3 literals, 4 bytes total
>But nchunks = 9.
>
OK, so essentially this means my formula works with whole chunks, i.e.
if we happen to need just a part of a decompressed chunk, we still
request enough data to decompress it whole. This way we may request up
to 7 extra bytes, which seems fine.
>Binguo's formula is appending 1 control bit per data byte and one extra
>control byte. Consider size = 8 bytes. We need 1 control byte, 8
>literals, 9 total. But compressed_size = 10.
>
>Mathematically correct formula is compressed_size = (int32) ((int64)
>rawsize * 9 + 7) / 8; Here we take one bit for each data byte, and 7
>control bits for overflow.
>
>But this equations make no big difference, each formula is safe. I'd
>pick one which is easier to understand and document (IMO, its nchunks =
>((rawsize + 7) / 8) * 9).
>
I'd use the *mathematically correct* formula, it doesn't seem to be any
more complex, and the "one bit per byte, complete bytes" explanation
seems quite understandable.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2019-10-01 10:12:08 | Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays |
Previous Message | Kyotaro Horiguchi | 2019-10-01 09:56:33 | Re: Modest proposal for making bpchar less inconsistent |