From: | Neil Conway <neil(dot)conway(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimizing COPY with SIMD |
Date: | 2024-06-07 18:07:36 |
Message-ID: | CAOW5sYaNuci8gNgEPuk0mx2QXi1rJBikmS=dNmR2jpf0K+4svg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 5, 2024 at 3:05 PM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:
> For pg_lfind32(), we ended up using an overlapping approach for the
> vectorized case (see commit 7644a73). That appeared to help more than it
> harmed in the many (admittedly branch predictor friendly) tests I ran. I
> wonder if you could do something similar here.
>
I didn't entirely follow what you are suggesting here -- seems like we
would need to do strlen() for the non-SIMD case if we tried to use a
similar approach.
It'd be interesting to see the threshold where your patch starts winning.
> IIUC the vector stuff won't take effect until there are 16 bytes to
> process. If we don't expect attributes to ordinarily be >= 16 bytes, it
> might be worth trying to mitigate this ~3% regression. Maybe we can find
> some other small gains elsewhere to offset it.
>
For the particular short-strings benchmark I have been using (3 columns
with 8-character ASCII strings in each), I suspect the regression is caused
by the need to do a strlen(), rather than the vectorized loop itself (we
skip the vectorized loop anyway because sizeof(Vector8) == 16 on this
machine). (This explains why we see a regression on short strings for text
but not CSV: CSV needed to do a strlen() for the non-quoted-string case
regardless). Unfortunately this makes it tricky to make the optimization
conditional on the length of the string. I suppose we could play some games
where we start with a byte-by-byte loop and then switch over to the
vectorized path (and take a strlen()) if we have seen more than, say,
sizeof(Vector8) bytes so far. Seems a bit kludgy though.
I will do some more benchmarking and report back. For the time being, I'm
not inclined to push to get the CopyAttributeOutTextVector() into the tree
in its current state, as I agree that the short-attribute case is quite
important.
In the meantime, attached is a revised patch series. This uses SIMD to
optimize CopyReadLineText in COPY FROM. Performance results:
====
master @ 8fea1bd5411b:
Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.944 s ± 0.013 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.927 s … 1.975 s 10 runs
Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.021 s ± 0.017 s [User: 0.002 s, System: 0.001
s]
Range (min … max): 1.005 s … 1.053 s 10 runs
master + SIMD patches:
Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
Time (mean ± σ): 1.513 s ± 0.022 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.493 s … 1.552 s 10 runs
Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
Time (mean ± σ): 1.032 s ± 0.032 s [User: 0.002 s, System: 0.001
s]
Range (min … max): 1.009 s … 1.113 s 10 runs
====
Neil
Attachment | Content-Type | Size |
---|---|---|
v4-0005-Optimize-COPY-TO-in-text-format-using-SIMD.patch | application/octet-stream | 8.4 KB |
v4-0003-Cosmetic-code-cleanup-for-CopyReadLineText.patch | application/octet-stream | 4.3 KB |
v4-0004-Optimize-COPY-TO-in-CSV-format-using-SIMD.patch | application/octet-stream | 6.1 KB |
v4-0002-Improve-COPY-test-coverage-for-handling-of-contro.patch | application/octet-stream | 1.9 KB |
v4-0001-Adjust-misleading-comment-placement.patch | application/octet-stream | 1000 bytes |
v4-0006-Optimize-COPY-FROM-using-SIMD.patch | application/octet-stream | 2.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Radu Radutiu | 2024-06-07 18:42:58 | Re: Postgresql OOM |
Previous Message | Tomas Vondra | 2024-06-07 17:41:10 | WIP: parallel GiST index builds |