From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | mark(at)mark(dot)mielke(dot)cc |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Faster StrNCpy |
Date: | 2006-10-02 18:30:11 |
Message-ID: | 13179.1159813811@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
mark(at)mark(dot)mielke(dot)cc writes:
> Here is the cache hit case including your strlen+memcpy as 'LENCPY':
> $ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 -o x x.c y.c strlcpy.c ; ./x
> NONE: 696157 us
> MEMCPY: 825118 us
> STRNCPY: 7983159 us
> STRLCPY: 10787462 us
> LENCPY: 6048339 us
It appears that these results are a bit platform-dependent; on my x86_64
(Xeon) Fedora 5 box, I get
$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE: 358679 us
MEMCPY: 619255 us
STRNCPY: 8932551 us
STRLCPY: 9212371 us
LENCPY: 13910413 us
I'm not sure why the lencpy method sucks so badly on this machine :-(.
Anyway, I looked at glibc's strncpy and determined that on this machine
the only real optimization that's been done to it is to unroll the data
copying loop four times. I did the same to strlcpy (attached) and got
numbers like these:
$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE: 359317 us
MEMCPY: 619636 us
STRNCPY: 8933507 us
STRLCPY: 7644576 us
LENCPY: 13917927 us
$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN="(1024*1024)" x.c y.c strlcpy.c
$ ./a.out
NONE: 502960 us
MEMCPY: 5382528 us
STRNCPY: 9733890 us
STRLCPY: 8740892 us
LENCPY: 15358616 us
$ gcc -O3 -std=c99 -DSTRING='"short"' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE: 358426 us
MEMCPY: 618533 us
STRNCPY: 6704926 us
STRLCPY: 867336 us
LENCPY: 10115883 us
$ gcc -O3 -std=c99 -DSTRING='"short"' -DN="(1024*1024)" x.c y.c strlcpy.c
$ ./a.out
NONE: 502746 us
MEMCPY: 5365171 us
STRNCPY: 7983610 us
STRLCPY: 5557277 us
LENCPY: 11533066 us
So the unroll seems to get us to the point of not losing compared to the
original strncpy code for any string length, and so I propose doing
that, if it holds up on other architectures.
regards, tom lane
size_t
strlcpy(char *dst, const char *src, size_t siz)
{
char *d = dst;
const char *s = src;
size_t n = siz;
/* Copy as many bytes as will fit */
if (n != 0) {
while (n > 4) {
if ((*d++ = *s++) == '\0')
goto done;
if ((*d++ = *s++) == '\0')
goto done;
if ((*d++ = *s++) == '\0')
goto done;
if ((*d++ = *s++) == '\0')
goto done;
n -= 4;
}
while (--n != 0) {
if ((*d++ = *s++) == '\0')
goto done;
}
}
/* Not enough room in dst, add NUL and traverse rest of src */
if (siz != 0)
*d = '\0'; /* NUL-terminate dst */
while (*s++)
;
done:
return(s - src - 1); /* count does not include NUL */
}
From | Date | Subject | |
---|---|---|---|
Next Message | Luke Lonergan | 2006-10-02 18:39:49 | Re: Faster StrNCpy |
Previous Message | uwcssa | 2006-10-02 17:37:21 | undescribe |
From | Date | Subject | |
---|---|---|---|
Next Message | Luke Lonergan | 2006-10-02 18:39:49 | Re: Faster StrNCpy |
Previous Message | Strong, David | 2006-10-02 16:06:35 | Re: Faster StrNCpy |