From: | "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> |
---|---|
To: | "'Simon Riggs'" <simon(at)2ndquadrant(dot)com> |
Cc: | "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Cost of XLogInsert CRC calculations |
Date: | 2005-05-16 08:03:51 |
Message-ID: | 9EB50F1A91413F4FA63019487FCD251D11332C@WEBBASEDDC.webbasedltd.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: Simon Riggs [mailto:simon(at)2ndquadrant(dot)com]
> Sent: 12 May 2005 16:52
> To: Mark Cave-Ayland (External)
> Cc: 'Christopher Kings-Lynne'; 'Tom Lane'; 'Bruce Momjian';
> pgsql-hackers(at)postgresql(dot)org
> Subject: RE: [HACKERS] Cost of XLogInsert CRC calculations
(cut)
> It might be possible to speed things up by a factor of two using a
> two- byte lookup table rather than a one byte lookup. This would take
> up 65536 table entries, which is only 256KB. If we held that in shared
> memory we could calculate the entries at startup, or read them in from
> a file. It would only use the same space as if we had 255 connections,
> so not a great increase in memory usage overall. Need to look at the
> effect of retrieving everything from memory rather than from
> cache, so it probably wouldn't be twice as fast.
>
> Best Regards, Simon Riggs
Hi everyone,
As per this thread, I decided to spend a little time over the weekend
looking at the performance of the existing CRC64 code in order to determine
whether we could code the algorithm in a more efficient manner. In order to
do this, I devised a couple of test programs using code cut/pasted from
pg_crc.h (and pg_resetxlog) in order perform some timings.
These tests were done on two systems: a P4 1.8GHz laptop with MingW (Win32)
and gcc 3.2.3, and a Xeon 2.8GHz server running FC1 and gcc 3.3.2.
The program used to time the CRC64 calculation simply did the following:
1) Create an 8k buffer of data
2) Fill the buffer with some repeatable values; in this case the
algorithm used was the following:
for (i = 0 ; i < 8192 ; i++ )
{
*data = (char )(i % 256);
}
3) Calculate the CRC64 of the buffer 10,000 times and display the
result,
making a note of the start and end times. Using this information an
estimate
of the cost of a single CRC calculation can esaily be found.
I noticed looking at the code that there were two versions of the CRC code,
one using 32-bit arguments wrapped within a check for INT64_IS_BUSTED and
another using 64-bit unsigned integers. Since I wasn't sure initially which
one was being used, I decided to test both of them :) Both programs were
compiled using gcc's -O2 flag for optimisation.
The first thing to notice about the results was that I obtained different
CRC values using both algorithms on Win32, which were both different to the
results obtained under Linux:
Win32 MingW:
1) INT64_IS_BUSTED code (32-bit): 0x541d4a27 (crc1) 0x8dfa57ae
(crc0)
2) 64-bit code: 0x817f722b1f17b253 (crc0)
Linux (FC1):
1) INT64_IS_BUSTED code (32-bit): 0x21d064a2 (crc1) 0xe7c74332
(crc0)
2) 64-bit code: 0x21d064a2e7c74332 (crc0)
The second interesting thing to notice was the comparative speeds:
Win32 MingW:
1) INT64_IS_BUSTED code (32-bit): ~57us per calculation
2) 64-bit code: ~130us per calculation
Linux (FC1):
1) INT64_IS_BUSTED code (32-bit): ~29us per calculation
2) 64-bit code: ~76us per calculation
Looking at the flags in pg_config.h from both source builds, it would appear
that the 64-bit code is being used since the compiler defines
HAVE_LONG_LONG_INT_64. So that means there could be a potential 100%
performance increase by just using the existing INT64_IS_BUSTED code on x86
32 bit processors. I don't know given the rate of xlog records being written
whether or not this translates to more than a couple of minutes in an insert
of a million records or so.
I did have a look at the assembly code produced by the INT64_IS_BUSTED code
and it appears fairly tight; I'm no x86 expert but I think it would be hard
to optimise what was produced by the compiler. If anyone else is interested
in looking at these results then I will happily send the test programs
off-list - however my main concern is that the Win32 CRC64 results just
don't seem to be right :(
Kind regards,
Mark.
------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT
T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk
From | Date | Subject | |
---|---|---|---|
Next Message | Дмитрий Летучий | 2005-05-16 08:57:12 | postgreSQL as deductive DBMS |
Previous Message | Jeffrey Baker | 2005-05-16 06:58:13 | Re: bitmap scans, btree scans, and tid order |