Re: Performance degradation on concurrent COPY into a single relation in PG16.

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Date: 2023-07-25 15:50:19
Message-ID: 20230725155019.fzdygtylmhqfp4db@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-07-25 23:37:08 +1200, David Rowley wrote:
> On Tue, 25 Jul 2023 at 17:34, Andres Freund <andres(at)anarazel(dot)de> wrote:
> I've not really studied the fix_COPY_DEFAULT.patch patch. Is there a
> reason to delay committing that? It would be good to eliminate that
> as a variable for the current performance regression.

Yea, I don't think there's a reason to hold off on that. Sawada-san, do you
want to commit it? Or Andrew?

> > prep:
> > COPY (SELECT generate_series(1, 2000000) a, (random() * 100000 - 50000)::int b, 3243423 c) TO '/tmp/lotsaints.copy';
> > DROP TABLE lotsaints; CREATE UNLOGGED TABLE lotsaints(a int, b int, c int);
> >
> > benchmark:
> > psql -qX -c 'truncate lotsaints' && pgbench -n -P1 -f <( echo "COPY lotsaints FROM '/tmp/lotsaints.copy';") -t 15
> >
> > I disabled turbo mode, pinned the server to a single core of my Xeon Gold 5215:
> >
> > HEAD: 812.690
> >
> > your patch: 821.354
> >
> > strtoint from 8692f6644e7: 824.543
> >
> > strtoint from 6b423ec677d^: 806.678
>
> I'm surprised to see the imul version is faster. It's certainly not
> what we found when working on 6b423ec67.

What CPUs did you test it on? I'd not be surprised if this were heavily
dependent on the microarchitecture.

> I've been fooling around a bit to try to learn what might be going on.
> I wrote 2 patches:
>
> 1) pg_strtoint_imul.patch: Effectively reverts 6b423ec67 and puts the
> code how it likely would have looked had I not done that work at all.
> (I've assumed that the hex, octal, binary parsing would have been
> added using the overflow multiplication functions just as the base 10
> parsing).

> 2) pg_strtoint_optimize.patch: I've made another pass over the
> functions with the current overflow checks and optimized the digit
> checking code so that it can be done in a single check like: if (digit
> < 10). This can be done by assigning the result of *ptr - '0' to an
> unsigned char. Anything less than '0' will wrap around and anything
> above '9' will remain so. I've also removed a few inefficiencies with
> the isspace checking. We didn't need to do "while (*ptr &&
> isspace(*ptr)). It's fine just to do while (isspace(*ptr)) since '\0'
> isn't a space char. I also got rid of the isxdigit call. The
> hexlookup array contains -1 for non-hex chars. We may as well check
> the digit value is >= 0.
>
> Here are the results I get using your test as quoted above:
>
> master @ 62e9af4c63f + fix_COPY_DEFAULT.patch
>
> latency average = 568.102 ms
>
> master @ 62e9af4c63f + fix_COPY_DEFAULT.patch + pg_strtoint_optimize.patch
>
> latency average = 531.238 ms
>
> master @ 62e9af4c63f + fix_COPY_DEFAULT.patch + pg_strtoint_imul.patch
>
> latency average = 559.333 ms (surprisingly also faster on my machine)
>
> The optimized version of the pg_strtoint functions wins over the imul
> patch. Could you test to see if this is also the case in your Xeon
> machine?

(these are the numbers with turbo disabled, if I enable it they're all in the
6xx ms range, but the variance is higher)

fix_COPY_DEFAULT.patch
774.344

fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch
777.673

fix_COPY_DEFAULT.patch + pg_strtoint_optimize.patch
777.545

fix_COPY_DEFAULT.patch + pg_strtoint_imul.patch
795.298

fix_COPY_DEFAULT.patch + pg_strtoint_imul.patch + likely(isdigit())
773.477

fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch + pg_strtoint_imul.patch
774.443

fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch + pg_strtoint_imul.patch + likely(isdigit())
774.513

fix_COPY_DEFAULT.patch + pg_strtoint32_base_10_first.v2.patch + pg_strtoint_imul.patch + likely(isdigit()) + unlikely(*ptr == '_')
763.879

One idea I had was to add a fastpath that won't parse all strings, but will
parse the strings that we would generate, and fall back to the more general
variant if it fails. See the attached, rough, prototype:

fix_COPY_DEFAULT.patch + fastpath.patch:
746.971

fix_COPY_DEFAULT.patch + fastpath.patch + isdigit.patch:
715.570

Now, the precise contents of this fastpath are not yet clear (wrt imul or
not), but I think the idea has promise.

> > (when I say strtoint from, I did not replace the goto labels, so that part is
> > unchanged and unrelated)
>
> I didn't quite follow this.

I meant that I didn't undo ereport()->ereturn().

Greetings,

Andres Freund

Attachment Content-Type Size
fastpath.patch text/x-diff 1.3 KB
isdigit.patch text/x-diff 430 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Isaac Morland 2023-07-25 15:54:39 Re: cataloguing NOT NULL constraints
Previous Message Robert Haas 2023-07-25 15:39:13 Re: cataloguing NOT NULL constraints