From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: WAL format and API changes (9.5) |
Date: | 2014-10-03 12:51:37 |
Message-ID: | 542E9BD9.3000600@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/16/2014 01:21 PM, Andres Freund wrote:
>> >While I've trying to nail down where that difference comes from, I've seen a
>> >lot of strange phenomenon. At one point, the patched version was 10% slower,
>> >but I was able to bring the difference down to 5% if I added a certain
>> >function in xloginsert.c, but never called it. It was very repeatable at the
>> >time, I tried adding and removing it many times and always got the same
>> >result, but I don't see it with the current HEAD and patch version anymore.
>> >So I think 5% is pretty close to the margin of error that arises from
>> >different compiler optimizations, data/instruction cache effects etc.
>> >
>> >Looking at the 'perf' profile, The new function calls only amount to about
>> >2% of overhead, so I'm not sure where the slowdown is coming from. Here are
>> >explanations I've considered, but I haven't been able to prove any of them:
> I'd suggest doing:
> a) perf stat -vvv of both workloads. That will often tell you stuff already
> b) Look at other events. Particularly stalled-cycles-frontend,
> stalled-cycles-backend, cache-misses
That didn't make me any wiser, unfortunately.
After a lot of experimentation, I figured out that the slowdown is
already apparent with the *first* patch, the one that just refactors
existing XLogInsert, without any changes to the WAL format or to the
callers. I fiddled with that for a long time, trying to tease apart the
change that makes the difference, and was finally able to narrow it down.
Attached are two patches. These are both just refactoring, with no
changes to the WAL format or APIs. The first is the same as the
refactoring patch I posted earlier, with only minor changes to
#includes, comments and such (per Alvaro's and Michael's suggestions -
thanks!). With the first patch, the test case I've been using to
performance test this becomes somewhere between 5% - 10% slower.
Applying the second patch on top of that restores the performance back
to what you get without these patches.
Strange. The second patch moves the CRC calculation from a separate loop
over the XLogRecDatas to the earlier loop, that also iterates through
the XLogRecDatas. The strange thing about this is that when I tried to
make the same change to current git master, without applying the first
patch, it didn't make any difference. The CRC calculation used to
integrated to the earlier loop in 9.1 and before, but in 9.2 it was
moved to a separate loop for simplicity, because it didn't make any
difference to performance.
So I now have a refactoring patch ready that I'd like to commit (the
attached two patches together), but to be honest, I have no idea why the
second patch is so essential to performance.
If someone else wants to try this out, the performance difference can be
seen with the test suite I posted earlier, or with an even simpler
pgbench test script:
-- create the test table once, truncate between pgbench runs:
create table test (id int4);
-- pgbench script:
insert into test select i from generate_series(1,100) i;
- Heikki
Attachment | Content-Type | Size |
---|---|---|
0001-Move-the-backup-block-logic-from-XLogInsert-to-a-new.patch | text/x-diff | 89.5 KB |
0002-Move-the-CRC-calculations-into-the-other-loop-throug.patch | text/x-diff | 5.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-10-03 13:10:49 | Re: WAL format and API changes (9.5) |
Previous Message | Stephen Frost | 2014-10-03 12:41:35 | Re: DDL Damage Assessment |