Quick Links

Re: Optimizing COPY with SIMD

From:	Neil Conway <neil(dot)conway(at)gmail(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Optimizing COPY with SIMD
Date:	2024-06-07 18:07:36
Message-ID:	CAOW5sYaNuci8gNgEPuk0mx2QXi1rJBikmS=dNmR2jpf0K+4svg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jun 5, 2024 at 3:05 PM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:

> For pg_lfind32(), we ended up using an overlapping approach for the
> vectorized case (see commit 7644a73). That appeared to help more than it
> harmed in the many (admittedly branch predictor friendly) tests I ran. I
> wonder if you could do something similar here.
>

I didn't entirely follow what you are suggesting here -- seems like we
would need to do strlen() for the non-SIMD case if we tried to use a
similar approach.

It'd be interesting to see the threshold where your patch starts winning.
> IIUC the vector stuff won't take effect until there are 16 bytes to
> process. If we don't expect attributes to ordinarily be >= 16 bytes, it
> might be worth trying to mitigate this ~3% regression. Maybe we can find
> some other small gains elsewhere to offset it.
>

For the particular short-strings benchmark I have been using (3 columns
with 8-character ASCII strings in each), I suspect the regression is caused
by the need to do a strlen(), rather than the vectorized loop itself (we
skip the vectorized loop anyway because sizeof(Vector8) == 16 on this
machine). (This explains why we see a regression on short strings for text
but not CSV: CSV needed to do a strlen() for the non-quoted-string case
regardless). Unfortunately this makes it tricky to make the optimization
conditional on the length of the string. I suppose we could play some games
where we start with a byte-by-byte loop and then switch over to the
vectorized path (and take a strlen()) if we have seen more than, say,
sizeof(Vector8) bytes so far. Seems a bit kludgy though.

I will do some more benchmarking and report back. For the time being, I'm
not inclined to push to get the CopyAttributeOutTextVector() into the tree
in its current state, as I agree that the short-attribute case is quite
important.

In the meantime, attached is a revised patch series. This uses SIMD to
optimize CopyReadLineText in COPY FROM. Performance results:

====
master @ 8fea1bd5411b:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.944 s ± 0.013 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.927 s … 1.975 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.021 s ± 0.017 s [User: 0.002 s, System: 0.001
s]
Range (min … max): 1.005 s … 1.053 s 10 runs

master + SIMD patches:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.513 s ± 0.022 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.493 s … 1.552 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.032 s ± 0.032 s [User: 0.002 s, System: 0.001
s]
Range (min … max): 1.009 s … 1.113 s 10 runs
====

Neil

Attachment	Content-Type	Size
v4-0005-Optimize-COPY-TO-in-text-format-using-SIMD.patch	application/octet-stream	8.4 KB
v4-0003-Cosmetic-code-cleanup-for-CopyReadLineText.patch	application/octet-stream	4.3 KB
v4-0004-Optimize-COPY-TO-in-CSV-format-using-SIMD.patch	application/octet-stream	6.1 KB
v4-0002-Improve-COPY-test-coverage-for-handling-of-contro.patch	application/octet-stream	1.9 KB
v4-0001-Adjust-misleading-comment-placement.patch	application/octet-stream	1000 bytes
v4-0006-Optimize-COPY-FROM-using-SIMD.patch	application/octet-stream	2.7 KB

In response to

Re: Optimizing COPY with SIMD at 2024-06-05 19:05:20 from Nathan Bossart

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Radu Radutiu	2024-06-07 18:42:58	Re: Postgresql OOM
Previous Message	Tomas Vondra	2024-06-07 17:41:10	WIP: parallel GiST index builds