Quick Links

Re: Re: CRC

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Guenter <bruceg(at)em(dot)ca>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: CRC
Date:	2000-12-09 23:46:23
Message-ID:	17602.976405583@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Bruce Guenter <bruceg(at)em(dot)ca> writes:
>> This is a lot closer than I'd have expected, but it sure ain't
>> "MD5 40% faster" as you reported. I wonder why the difference
>> in results between your platform and mine?

> The difference is likely because PA-RISC (like most other RISC
> architectures) lack a "roll" opcode that is very prevalent in the MD5
> algorithm.

A good theory, but unfortunately not a correct theory. PA-RISC can do a
circular shift in one cycle using the "shift right double" instruction,
with the same register specified as both high and low halves of the
64-bit operand. And gcc does know about that.

After some groveling through assembly code, it seems that the CRC-32
implementation is about as tight as it could get: two loads, two XORs,
and two EXTRU's per byte (one used to implement the right shift, the
other to implement masking with 0xFF). And the wall clock timing does
indeed come out to just about six cycles per byte. The MD5 code also
looks pretty tight. Each basic OP requires either two or three logical
operations (and/or/xor/not) depending on which round you're looking at,
plus four additions and a circular shift. PA-RISC needs two cycles to
load an arbitrary 32-bit constant, but other than that I see no wasted
cycles here:

ldil L'-1444681467,%r20
xor %r3,%r14,%r19
ldo R'-1444681467(%r20),%r20
and %r1,%r19,%r19
addl %r15,%r20,%r20
xor %r14,%r19,%r19
addl %r19,%r26,%r19
addl %r20,%r19,%r15
shd %r15,%r15,27,%r15
addl %r15,%r3,%r15

Note gcc has been smart enough to assign all the correct_words[] array
elements to registers, else we'd lose another cycle to a load operation
--- fortunately PA-RISC has lots of registers.

There are 64 of these basic OPs needed in each round, and each round
processes 64 input bytes, so basically you can figure one OP per byte.
Ignoring loop overhead and so forth, it's nine or ten cycles per byte
for MD5 versus six for CRC.

I'm at a loss to see how a Pentium would arrive at a better result for
MD5 than for CRC. For one thing, it's going to be at a disadvantage
because it hasn't got enough registers. I'd be interested to see the
assembly code...

regards, tom lane

In response to

Re: Re: CRC at 2000-12-09 05:46:26 from Bruce Guenter

Responses

Re: Re: CRC at 2000-12-10 06:24:17 from Bruce Guenter
Re: Re: CRC at 2000-12-10 07:07:24 from Nathan Myers

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2000-12-09 23:58:01	Re: [COMMITTERS] pgsql/src/backend/commands (command.c vacuum.c)
Previous Message	Hiroshi Inoue	2000-12-09 23:25:24	RE: [COMMITTERS] pgsql/src/backend/commands (command.c vacuum.c)