From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Greg Smith <greg(at)2ndquadrant(dot)com>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Performance Improvement by reducing WAL for Update Operation |
Date: | 2013-11-27 05:56:33 |
Message-ID: | CAA4eK1+X7p7o=UgAzSG1-72-YWTzDQ7DABA0er-PKP1_3zkAvQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 26, 2013 at 8:25 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 22, 2013 at 2:31 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>
> I spent a little time running the tests from Heikki's script under
> perf. On all three "two short fields" tests and also on the "ten tiny
> fields, all changed" test, we spend about 1% of the CPU time in
> pglz_delta_encode. I don't see any evidence that it's actually
> compressing anything at all; it appears to be falling out where we
> test the input length against the strategy, presumably because the
> default strategy (which we largely copy here) doesn't try to compress
> input data of less than 32 bytes. Given that this code isn't
> actually compressing anything in these cases, I'm a bit confused by
> Greg's report of substantial gains on "ten tiny fields, all changed"
> test; how can we win if we're not compressing?
I think it is mainly due to variation of test data on Mac, if we see
the results of Linux, there is not much difference in results.
Also the results posted by Heikki or by me at below links doesn't show
such inconsistency for "ten tiny fields, all changed" case.
http://www.postgresql.org/message-id/51366323.8070606@vmware.com
http://www.postgresql.org/message-id/001f01ce1c14$d3af0770$7b0d1650$@kapila@huawei.com
(Refer test_readings.txt)
> I studied the "hundred tiny fields, half nulled" test case in some
> detail. Some thoughts:
>
> - There is a comment "TODO: It would be nice to behave like the
> history and the source strings were concatenated, so that you could
> compress using the new data, too." If we're not already doing that,
> then how are we managing to compress WAL by more than one-quarter in
> the "hundred tiny fields, all changed" case?
Algorithm is not doing concatenation of history and source strings,
the hash table is formed just with history data and then later
if match is not found then it is added to history, so this can be the
reason for the above result.
> It looks to me like the
> patch IS doing that, and I'm not sure it's a good idea, especially
> because it's using pglz_hist_add_no_recycle rather than pglz_add_hist:
> we verify that hlen + slen < 2 * PGLZ_HISTORY_SIZE but that doesn't
> seem good enough. On the "hundred tiny fields, half nulled" test case,
> removing that line reduces compression somewhat but also saves on CPU
> cycles.
> - pglz_find_match() is happy to walk all the way down even a really,
> really long bucket chain. It has some code that reduces good_match
> each time through, but it fails to produce a non-zero decrement once
> good_match * good_drop < 100. So if we're searching an enormously
> deep bucket many times in a row, and there are actually no matches,
> we'll go flying down the whole linked list every time. I tried
> mandating a minimum decrease of 1 and didn't observe that it made any
> difference in the run time of this test case, but it still seems odd.
> For the record, it's not new behavior with this patch; pglz_compress()
> has the same issue as things stand today. I wonder if we ought to
> decrease the good match length by a constant rather than a percentage
> at each step.
>
> - pglz_delta_encode() seems to spend about 50% of its CPU time loading
> up the history data. It would be nice to find a way to reduce that
> effort. I thought about maybe only making a history entry for, say,
> every fourth input position rather than every one, but a quick test
> seems to suggest that's a big fail, probably because it's too easy to
> skip over the position where we would have made "the right" match via
> some short match. But maybe there's some more sophisticated strategy
> here that would work better. For example, see:
>
> http://en.wikipedia.org/wiki/Rabin_fingerprint
>
> The basic idea is that you use a rolling hash function to divide up
> the history data into chunks of a given average size. So we scan the
> history data, compute a rolling hash value at each point, and each
> time the bottom N bits are zero, we consider that to be the end of a
> chunk. We enter all the chunks into a hash table. The chunk size
> will vary, but on the average, given a reasonably well-behaved rolling
> hash function (the pglz one probably doesn't qualify) it'll happen
> every 2^N bytes, so perhaps for this purpose we'd choose N to be
> between 3 and 5. Then, we scan the input we want to compress and
> divide it into chunks in the same way. Chunks that don't exist in the
> history data get copied to the output, while those that do get
> replaced with a reference to their position in the history data.
>
> I'm not 100% certain that something like this would be better than
> trying to leverage the pglz machinery, but I think it might be worth
> trying. One possible advantage is that you make many fewer hash-table
> entries, which reduces both the cost of setting up the hash table and
> the cost of probing it; another is that if you find a hit in the hash
> table, you needn't search any further: you are done; this is related
> to the point that, for the most part, the processing is
> chunk-at-a-time rather than character-at-a-time, which might be more
> efficient. On the other hand, the compression ratio might stink, or
> it might suck for some other reason: I just don't know.
I think this idea looks better than current and it will definately
improve some of the cases, but not sure we can win in all cases.
We have tried one of the similar idea (reduce size of hash and
eventually comparision) by adding every 10 bytes to the history
lookup table rather than every byte. It improved most cases but not
all cases ("hundred tiny fields, all changed",
"hundred tiny fields, half changed" test were still slow).
Patch and results are at link (refer approach-1):
http://www.postgresql.org/message-id/001f01ce1c14$d3af0770$7b0d1650$@kapila@huawei.com
Now the tough question is what are the possible options for this patch
and which one to pick:
a. optimize encoding technique, so that it can improve results in most
cases even if not all.
b. have a table level option or guc to enable/disable WAL compression.
c. use some heuristics to check if chances of compression are good,
then only perform compression.
1. apply this optimization for tuple size > 128 and < 2000
2. apply this optimization if number of modified columns are less
than 25% (some threshold number) of total columns.
I think we can get modified columns from target entry and use
it if triggers haven't changed that tuple. I remember
earlier there were concerns that this value can't be trusted
completely, but I think using it as a heuristic is not a
problem, even if this number is not right in some cases.
d. forget about this optimization and reject the patch.
I think by doing option 'b' and 'c' together can make this
optimization usable in cases where it is actually useful.
Opinions/Suggestions?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | David E. Wheeler | 2013-11-27 07:07:25 | Toward a Database URI Standard |
Previous Message | Fujii Masao | 2013-11-27 05:05:15 | Re: New option for pg_basebackup, to specify a different directory for pg_xlog |