Re: Optimizing COPY with SIMD

From: Neil Conway <neil(dot)conway(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizing COPY with SIMD
Date: 2024-06-05 17:46:44
Message-ID: CAOW5sYaEvZzO9n5-fvA0fGKAZGD4XRVgA5QfbOeHjXn8F=HfWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review and feedback!

On Mon, Jun 3, 2024 at 10:56 AM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:

> > -/*
> > - * Send text representation of one attribute, with conversion and
> escaping
> > - */
> > #define DUMPSOFAR() \
>
> IIUC this comment was meant to describe the CopyAttributeOutText() function
> just below this macro. When the macro was added in commit 0a5fdb0 from
> 2006, the comment became detached from the function. Maybe we should just
> move it back down below the macro.
>

Ah, that makes sense -- done.

> > +/*
> > + * Send text representation of one attribute, with conversion and
> CSV-style
> > + * escaping. This variant uses SIMD instructions to optimize
> processing, but
> > + * we can only use this approach when encoding_embeds_ascii if false.
> > + */
>
> nitpick: Can we add a few words about why using SIMD instructions when
> encoding_embeds_ascii is true is difficult? I don't dispute that it is
> complex and/or not worth the effort, but it's not clear to me why that's
> the case just from reading the patch.
>

Sounds good.

> > +static void
> > +CopyAttributeOutCSVFast(CopyToState cstate, const char *ptr,
> > + bool use_quote)
>
> nitpick: Can we add "vector" or "simd" to the name instead of "fast"? IMHO
> it's better to be more descriptive.
>

Sure, done.

Attached is a revised patch series, that incorporates the feedback above
and makes two additional changes:

* Add some regression tests to cover COPY behavior with octal and hex
escape sequences
* Optimize the COPY TO text (non-CSV) code path (CopyAttributeOutText()).

In CopyAttributeOutText(), I refactored some code into a helper function to
reduce code duplication, on the theory that field delimiters and escape
sequences are rare, so we don't mind taking a function call in those cases.

We could go further and use the same code to handle both the tail of the
string in the vectorized case and the entire string in the non-vectorized
case, but I didn't bother with that -- as written, it would require taking
an unnecessary strlen() of the input string in the non-vectorized case.

Performance for COPY TO in text (non-CSV) mode:

===
master

Benchmark 1: ./psql -f
/Users/neilconway/copy-out-bench-text-long-strings.sql
Time (mean ± σ): 1.240 s ± 0.013 s [User: 0.001 s, System: 0.000
s]
Range (min … max): 1.220 s … 1.256 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-text-short.sql
Time (mean ± σ): 522.3 ms ± 11.3 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 512.0 ms … 544.3 ms 10 runs

master + SIMD patches:

Benchmark 1: ./psql -f
/Users/neilconway/copy-out-bench-text-long-strings.sql
Time (mean ± σ): 867.6 ms ± 12.7 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 842.1 ms … 891.6 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-text-short.sql
Time (mean ± σ): 536.7 ms ± 10.9 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 530.1 ms … 566.8 ms 10 runs
===

Looks like there is a slight regression for short attribute values, but I
think the tradeoff is a net win.

I'm going to take a look at applying similar ideas to COPY FROM next.

Neil

Attachment Content-Type Size
v2-0001-Adjust-misleading-comment-placement.patch application/octet-stream 1000 bytes
v2-0002-Improve-COPY-test-coverage-for-handling-of-contro.patch application/octet-stream 1.9 KB
v2-0003-Optimize-COPY-TO-in-CSV-format-using-SIMD.patch application/octet-stream 6.1 KB
v2-0004-Optimize-COPY-TO-in-text-format-using-SIMD.patch application/octet-stream 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-06-05 17:49:58 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Previous Message Nathan Bossart 2024-06-05 17:45:48 Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade