From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> |
Subject: | Re: Performance Improvement by reducing WAL for Update Operation |
Date: | 2014-01-29 12:21:06 |
Message-ID: | CAA4eK1JeG2gL3UHPzseP4ynogeoBf-eHWk-cMX5AQqWeqYubLw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 29, 2014 at 3:41 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 01/28/2014 07:01 PM, Heikki Linnakangas wrote:
>>
>> On 01/27/2014 07:03 PM, Amit Kapila wrote:
>>>
>>> I have tried to improve algorithm in another way so that we can get
>>> benefit of same chunks during find match (something similar to lz).
>>> The main change is to consider chunks at fixed boundary (4 byte)
>>> and after finding match, try to find if there is a longer match than
>>> current chunk. While finding longer match, it still takes care that
>>> next bigger match should be at chunk boundary. I am not
>>> completely sure about the chunk boundary may be 8 or 16 can give
>>> better results.
>>
>>
>> Since you're only putting a value in the history every 4 bytes, you
>> wouldn't need to calculate the hash in a rolling fashion. You could just
>> take next four bytes, calculate hash, put it in history table. Then next
>> four bytes, calculate hash, and so on. Might save some cycles when
>> building the history table...
First of all thanks for looking into patch.
Yes, this is right, we can save cycles by not doing rolling during hash
calculation and I was working to improve the patch on those lines. Earlier
it was their because of rabin's delta encoding where we need to check
for a special match after each byte.
>
> On a closer look, you're putting a chunk in the history table only every
> four bytes, but you're *also* checking the history table for a match only
> every four bytes. That completely destroys the shift-resistence of the
> algorithm.
You are right that it will loose the shift-resistence and even Robert has
pointed me this, that's why he wants to maintain the property of special
bytes at chunk boundaries as mentioned in Rabin encoding. The only
real reason to shift to fixed size was to improve CPU usage and I
thought most cases in Update will update the fixed length columns,
but it might not be true.
> For example, if the new tuple is an exact copy of the old tuple,
> except for one additional byte in the beginning, the algorithm would fail to
> recognize that. It would be good to add a test case like that in the test
> suite.
>
> You can skip bytes when building the history table, or when finding matches,
> but not both. Or you could skip N bytes, and then check for matches for the
> next four bytes, then skip again and so forth, as long as you always check
> four consecutive bytes (because the hash function is calculated from four
> bytes).
Can we do something like:
Build Phase
a. Calculate the hash and add the entry in history table at every 4 bytes.
Match Phase
a. Calculate the hash in rolling fashion and try to find match at every byte.
b. When match is found then skip only in chunks, something like I was
doing in find match function
+
+ /* consider only complete chunk matches. */
+ if (history_chunk_size == 0)
+ thislen += PGRB_MIN_CHUNK_SIZE;
+ }
Will this address the concern?
The main reason to process in chunks as much as possible is to save
cpu cycles. For example if we build hash table byte-by-byte, then even
for best case where most of tuple has a match, it will have reasonable
overhead due to formation of hash table.
>
> I couldn't resist the challenge, and started hacking this. First, some
> observations from your latest patch (pgrb_delta_encoding_v5.patch):
>
> 1. There are a lot of comments and code that refers to "chunks", which seem
> obsolete. For example, ck_size field in PGRB_HistEntry is always set to a
> constant, 4, except maybe at the very end of the history string. The
> algorithm has nothing to do with Rabin-Karp anymore.
>
> 2. The 'hindex' field in PGRB_HistEntry is unused. Also, ck_start_pos is
> redundant with the index of the entry in the array, because the array is
> filled in order. That only leaves us just the 'next' field, and that can be
> represented as a int16 rather than a pointer. So, we really only need a
> simple int16 array as the history entries array.
>
> 3. You're not gaining much by calculating the hash in a rolling fashion. A
> single rolling step requires two multiplications and two sums, plus shifting
> the variables around. Calculating the hash from scratch requires three
> multiplications and three sums.
>
> 4. Given that we're not doing the Rabin-Karp variable-length chunk thing, we
> could use a cheaper hash function to begin with. Like, the one used in pglz.
> The multiply-by-prime method probably gives fewer collisions than pglz's
> shift/xor method, but I don't think that matters as much as computation
> speed. No-one has mentioned or measured the effect of collisions in this
> thread; that either means that it's a non-issue or that no-one's just
> realized how big a problem it is yet. I'm guessing that it's not a problem,
> and if it is, it's mitigated by only trying to find matches every N bytes;
> collisions would make finding matches slower, and that's exactly what
> skipping helps with.
>
> After addressing the above, we're pretty much back to PGLZ approach.
Here during match phase, I think we can avoid copying literal bytes until
a match is found, that can save cycles for cases when old and new
tuple are mostly different.
> I kept
> the change to only find matches every four bytes, that does make some
> difference. And I like having this new encoding code in a separate file, not
> mingled with pglz stuff, it's sufficiently different that that's better. I
> haven't done all much testing with this, so take it with a grain of salt.
>
> I don't know if this is better or worse than the other patches that have
> been floated around, but I though I might as well share it..
Thanks for sharing the patch, I can take the data and compare it with
existing approach, if you think the explanation to change algorithm I
have given above is okay.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2014-01-29 12:29:25 | Re: WIP patch (v2) for updatable security barrier views |
Previous Message | Dean Rasheed | 2014-01-29 12:20:38 | Re: WIP patch (v2) for updatable security barrier views |