From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: WAL format and API changes (9.5) |
Date: | 2014-09-16 10:21:38 |
Message-ID: | 20140916102138.GI23806@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2014-09-15 15:41:22 +0300, Heikki Linnakangas wrote:
> Here we go. I've split this again into two patches. The first patch is just
> refactoring the current code. It moves XLogInsert into a new file,
> xloginsert.c, and the definition of XLogRecord to new xlogrecord.h header
> file. As a result, there is a a lot of churn in the #includes in C files
> that generate WAL records, or contain redo routines. The number of files
> that pull in xlog.h - directly or indirectly through other headers - is
> greatly reduced.
>
> The second patch contains the interesting changes.
>
> I wrote a little benchmark kit to performance test this. I'm trying to find
> out two things:
>
> 1) How much CPU overhead do the new XLogBeginInsert and XLogRegister*
> functions add, compared to the current approach with XLogRecDatas.
>
> 2) How much extra WAL is generated with the patch. This affects the CPU time
> spent in the tests, but it's also interesting to measure directly, because
> WAL size affects many things like WAL archiving, streaming replication etc.
>
> Attached is the test kit I'm using. To run the battery of tests, use "psql
> -f run.sql". To answer the question of WAL volume, it runs a bunch of tests
> that exercise heap insert, update and delete, as well as b-tree and GIN
> insertions. To answer the second test, it runs a heap insertion test, with a
> tiny record size that's chosen so that it generates exactly the same amount
> of WAL after alignment with and without the patch. The test is repeated many
> times, and the median of runtimes is printed out.
>
> Here are the results, comparing unpatched and patched versions. First, the
> WAL sizes:
>
> A heap insertion records are 2 bytes larger with the patch. Due to
> alignment, that makes for a 0 or 8 byte difference in the record sizes.
> Other WAL records have a similar store; a few extra bytes but no big
> regressions. There are a few outliers above where it appears that the
> patched version takes less space. Not sure why that would be, probably just
> a glitch in the test, autovacuum kicked in or something.
I've to admit, that's already not a painless amount of overhead.
> Now, for the CPU overhead:
>
> description | dur_us (orig) | dur_us (patched) | %
> ----------------+---------------+------------------+--------
> heap insert 30 | 0.7752835 | 0.831883 | 107.30
> (1 row)
>
> So, the patched version runs 7.3 % slower. That's disappointing :-(.
>
> This are the result I got on my laptop today. Previously, the typical result
> I've gotten has been about 5%, so that's a bit high. Nevertheless, even a 5%
> slowdown is probably not acceptable.
Yes, I definitely think it's not.
> While I've trying to nail down where that difference comes from, I've seen a
> lot of strange phenomenon. At one point, the patched version was 10% slower,
> but I was able to bring the difference down to 5% if I added a certain
> function in xloginsert.c, but never called it. It was very repeatable at the
> time, I tried adding and removing it many times and always got the same
> result, but I don't see it with the current HEAD and patch version anymore.
> So I think 5% is pretty close to the margin of error that arises from
> different compiler optimizations, data/instruction cache effects etc.
>
> Looking at the 'perf' profile, The new function calls only amount to about
> 2% of overhead, so I'm not sure where the slowdown is coming from. Here are
> explanations I've considered, but I haven't been able to prove any of them:
I'd suggest doing:
a) perf stat -vvv of both workloads. That will often tell you stuff already
b) Look at other events. Particularly stalled-cycles-frontend,
stalled-cycles-backend, cache-misses
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-09-16 10:28:07 | Re: CRC algorithm (was Re: [REVIEW] Re: Compression of full-page-writes) |
Previous Message | David Rowley | 2014-09-16 10:20:52 | Re: Patch to support SEMI and ANTI join removal |