Re: Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2014-01-11 06:08:38
Message-ID: CAA4eK1+wpwK=t=c4ga0K3_d1dDbbdtSXOpzS8_CB4w7hsg1QuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 10, 2014 at 9:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> 2. Provide a new reloption to specify Wal compression
>> for update operation on table
>> Create table tbl(c1 char(100)) With (compress_wal = true);
>>
>> Alternative options:
>> a. compress_wal can take input as operation, e.g. 'insert', 'update',
>> b. use alternate syntax:
>> Create table tbl(c1 char(100)) Compress Wal For Update;
>> c. anything better?
>
> I think WITH (compress_wal = true) is pretty good. I don't understand
> your point about taking the operation as input, because this only
> applies to updates.

Yes, currently this applies to update, what I have in mind is that
in future if some one wants to use WAL compression for any other
operation like 'full_page_writes', then it can be easily extendible.

To be honest, I have not evaluated whether such a flag or compression
would make sense for full page writes, but I think it should be possible
while doing full page write (BkpBlock has RelFileNode) to check such a
flag if it's present.

> But we could try to work "update" into the name
> of the setting somehow, so as to be less likely to conflict with
> future additions, like maybe wal_compress_update. I think the
> alternate syntax you propose is clearly worse, because it would
> involve adding new keywords, something we try to avoid.

Yes, this would be better than current, I will include it in
next version of patch, unless there is any other better idea than
this one.

>
>> Points to consider
>> -----------------------------
>>
>> 1. As the current algorithm store the entry for same chunks at head of list,
>> it will always find last but one chunk (we don't store last 4 bytes) for
>> long matching string during match phase in encoding (pgrb_delta_encode).
>>
>> We can improve it either by storing same chunks at end of list instead of at
>> head or by trying to find a good_match technique used in lz algorithm.
>> Finding good_match technique can have overhead in some of the cases
>> when there is actually no match.
>
> I don't see what the good_match thing has to do with anything in the
> Rabin algorithm.

The case for which I have mentioned it is where most of the data is
repetitive and modified tuple is almost same.

For example

orignal tuple
aaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

modified tuple
ccaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Now let us see what will happen as per current algorithm

Step -1: Form the hash table (lets consider 4 byte chunks just for purpose
of explanation):
a. First chunk after first 4 a's (aaaa) with hash value 1024.
b. After that all the chunks will be 4 b's (bbbb) and will have
same hash value (lets call it H2), so will be mapped to same
bucket (let us call it 'B2' bucket) and hence form a list.
Every new chunk with same hash value is added to front of list,
so if go by this B2 has front element with hash value as H2
and location as 58 (last but 8 bytes, as we don't include last
4 bytes in our algorithm)

Step -2: Perform encoding for modified tuple by using hash table formed in
Step-1
a. First chunk after first 4 bytes (ccaa) with hash value 1056.
b. try to find match in hash table, no match, so proceed.
c. Next chunk with 4 b's and hash value H2, try to find a match,
it will find match in B2 bucket (at first element).
d. okay hash value matched, so it will try to match each byte.
if all bytes matched, it will consider chunk as matched_chunk.
e. Now if the chunk matches, then we try to match consecutive bytes
after chunk (in the hope of finding longer match), and in this
case, it will find a match of 8 bytes and consider match_len as 8.
f. it will increment the modified tuple (dp) to point to byte
next to match_len.
g. Again, it will consider next 4 b's as chunk and repeat step
c~f.

So here, what is happening is steps c~f are getting repeated, whereas if
we would have added same chunks at end of list in step 1b, then we
could have found the matching string just in one go (c~f).

The reason of adding the same chunk in head of list is that it uses same
technique as pglz_hist_add. Now in pglz, it will not have repeat steps
from c~f, as it has concept of good_match which leads to get this done in
one go.

Being said above, I am really not sure, how much real world data falls
in above category and should we try to optimize based on above example,
but yes it will save some CPU cycles in current test we are using.

>
>But I do think there might be a bug here, which is
> that, unless I'm misinterpreting something, hp is NOT the end of the
> chunk. After calling pgrb_hash_init(), we've looked at the first FOUR
> bytes of the input. If we find that we have a zero hash value at that
> point, shouldn't the chunk size be 4, not 1? And similarly if we find
> it after sucking in one more byte, shouldn't the chunk size be 5, not
> 2? Right now, we're deciding where the chunks should end based on the
> data in the chunk plus the following 3 bytes, and that seems wonky. I
> would expect us to include all of those bytes in the chunk.

It depends on how we define chunk, basically chunk size will be based
on the byte for which we consider hindex. The hindex for any byte is
calculated considering that byte and the following 3 bytes, so
after calling pgrb_hash_init(), even though we have looked at 4 bytes
but still the hindex is for first byte and thats why it consider
chunk size as 1, not 4.

Isn't it similar to how current pglz works, basically it also
uses next 4 bytes to calculate index (pglz_hist_idx) but still
does byte by byte comparison, here if we try to map to rabin's
delta encoding then always chunk size is 1.

If we follow the same logic to define chunk both for encoding and match,
will there be any problem?

I have tried to keep the implementation closer to previous lz delta
encoding, but if you see benefit in including the supporting bytes
(next 3 bytes) to define a chunk, then I can try to change it.

>> 2. Another optimization that we can do in pgrb_find_match(), is that
>> currently if
>> it doesn't find the first chunk (chunk got by hash index) matching, it
>> continues to find the match in other chunks. I am not sure if there is any
>> benefit to search for other chunks if first one is not matching.
>
> Well, if you took that out, I suspect it would hurt the compression
> ratio.

True, this is the reason I have kept it, but was not sure what kind
of scenarios it can benefit and whether such scenarios can be
more common for updates.

> Unless the CPU savings are substantial, I'd leave it alone.

Okay, lets leave it as it is.

>> 3. We can move code from pg_lzcompress.c to some new file pg_rbcompress.c,
>> if we want to move, then we need to either duplicate some common macros
>> like pglz_out_tag or keep it common, but might be change the name.
>
> +1 for a new file.

Okay, will take care of it in next version.

>
>> 4. Decide on min and max chunksize. (currently kept as 2 and 4 respectively).
>> The point to consider is that if we keep bigger chunk sizes, then it can
>> save us on CPU cycles, but less reduction in Wal, on the other side if we
>> keep it small it can have better reduction in Wal but consume more CPU
>> cycles.
>
> Whoa. That seems way too small. Since PGRB_PATTERN_AFTER_BITS is 4,
> the average length of a chunk is about 16 bytes. It makes little
> sense to have the maximum chunk size be 25% of the expected chunk
> length. I'd recommend making the maximum chunk length something like
> 4 * PGRB_CONST_NUM, and the minimum chunk length maybe something like
> 4.

Agreed, but I think for some strings (where it doesn't find special bit
pattern) it will create long chunks which can effect the reduction.
AFAIR, in the current test which we are using to evaluate the
performance of this patch has strings where it will do so, but on
the other side, this test might not be the case for real update strings.

I will make these modifications and report here if it affects the results.

>> 5. kept an guc variable 'wal_update_compression_ratio', for test purpose, we
>> can remove it before commit.
>
> Let's remove it now.

Sure, will remove in next version.

>
>> 7. docs needs to be updated, tab completion needs some work.
>
> Tab completion can be skipped for now, but documentation is important.

Agreed, left it for this version of patch, so that we can conclude on syntax
and then accordingly, I can mention it in docs.

>> 8. We can extend Alter Table to set compress option for table.
>
> I don't understand what you have in mind here.

Let us say user created table without this new option (compress_wal)
and later wants to enable it, so we can provide similar to what we
provide for other storage parameters.
Alter Table Set (compress_wal=true)

One more point to note that in the version of patch I posted, it has
default value for compress_wal as true, just for easiness of test,
we might want to change it to false for backeward compatibility.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-01-11 06:48:35 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Greg Stark 2014-01-11 06:01:02 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE