From: | Ants Aasma <ants(at)cybertec(dot)at> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Enabling Checksums |
Date: | 2013-03-20 10:31:10 |
Message-ID: | CA+CSw_toidEtAh6uFP-tUgyY226b9hSru3A7YECpxEALL2oaqg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 20, 2013 at 5:40 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I see compatibility with the acceleration as a tie-breaker. If there's two
> approaches that are otherwise about equal, such as choosing the exact CRC
> polynomial, you might as well pick the one that works faster with Intel's
> SSE. I'll make sure that this gets benchmarked soon on a decent AMD system
> too though. I've been itching to assemble a 24 core AMD box at home anyway,
> this gives me an excuse to pull the trigger on that.
I went ahead and changed the hand coded ASM to do 16bit sums so it's
fully SSE2 based. While at it I moved some explicit address
calculation in the inner loop into addressing commands. I then tested
this on a 6 year old low end AMD Athlon 64 (I think it's a K8) for a
not-so-recent CPU data point. Results from a plain -O2 compile:
CRC32 slicing by 8 Algorithm (bytes/cycle), 0.649208
CRC32 zlib (bytes/cycle), 0.405863
Fletcher Algorithm: (bytes/cycle), 1.309119
Fletcher Algorithm hand unrolled: (bytes/cycle), 3.063854
SIMD Algorithm (gcc): (bytes/cycle), 0.453141
SIMD Algorithm (hand coded): (bytes/cycle), 4.481808
Slower speed of the SIMD is expected here as K8 only has 64bit data
paths. It does surprsiginly well on the CRC32 algorithm, probably
thanks to lower L1 latency.
The asm rewrite made Intel also faster, now runs on Sandy Bridge at
11.2 bytes/cycle.
New version of code attached for anyone who would like to test. Build
with "gcc -g -O2 crc.c 8x256_tables.c -lm -o crc". Run with "./crc -t
warm -d warm -i 1 -p 8192 -n 1000000". Should run without errors on
all x86-64 CPU's.
Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
Attachment | Content-Type | Size |
---|---|---|
intel-slice-by-8.tar.gz | application/x-gzip | 66.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2013-03-20 10:32:23 | Re: machine-parseable object descriptions |
Previous Message | Simon Riggs | 2013-03-20 09:00:05 | Re: Ignore invalid indexes in pg_dump |