From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Performance degradation on concurrent COPY into a single relation in PG16. |
Date: | 2023-07-25 11:37:08 |
Message-ID: | CAApHDvpwGD=zidmtVxjfKBkYxKKg9eZkSC01+JWb4WCL3WqnSA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 25 Jul 2023 at 17:34, Andres Freund <andres(at)anarazel(dot)de> wrote:
> prep:
> COPY (SELECT generate_series(1, 2000000) a, (random() * 100000 - 50000)::int b, 3243423 c) TO '/tmp/lotsaints.copy';
> DROP TABLE lotsaints; CREATE UNLOGGED TABLE lotsaints(a int, b int, c int);
>
> benchmark:
> psql -qX -c 'truncate lotsaints' && pgbench -n -P1 -f <( echo "COPY lotsaints FROM '/tmp/lotsaints.copy';") -t 15
>
> I disabled turbo mode, pinned the server to a single core of my Xeon Gold 5215:
>
> HEAD: 812.690
>
> your patch: 821.354
>
> strtoint from 8692f6644e7: 824.543
>
> strtoint from 6b423ec677d^: 806.678
I'm surprised to see the imul version is faster. It's certainly not
what we found when working on 6b423ec67.
I've been fooling around a bit to try to learn what might be going on.
I wrote 2 patches:
1) pg_strtoint_imul.patch: Effectively reverts 6b423ec67 and puts the
code how it likely would have looked had I not done that work at all.
(I've assumed that the hex, octal, binary parsing would have been
added using the overflow multiplication functions just as the base 10
parsing).
2) pg_strtoint_optimize.patch: I've made another pass over the
functions with the current overflow checks and optimized the digit
checking code so that it can be done in a single check like: if (digit
< 10). This can be done by assigning the result of *ptr - '0' to an
unsigned char. Anything less than '0' will wrap around and anything
above '9' will remain so. I've also removed a few inefficiencies with
the isspace checking. We didn't need to do "while (*ptr &&
isspace(*ptr)). It's fine just to do while (isspace(*ptr)) since '\0'
isn't a space char. I also got rid of the isxdigit call. The
hexlookup array contains -1 for non-hex chars. We may as well check
the digit value is >= 0.
Here are the results I get using your test as quoted above:
master @ 62e9af4c63f + fix_COPY_DEFAULT.patch
latency average = 568.102 ms
master @ 62e9af4c63f + fix_COPY_DEFAULT.patch + pg_strtoint_optimize.patch
latency average = 531.238 ms
master @ 62e9af4c63f + fix_COPY_DEFAULT.patch + pg_strtoint_imul.patch
latency average = 559.333 ms (surprisingly also faster on my machine)
The optimized version of the pg_strtoint functions wins over the imul
patch. Could you test to see if this is also the case in your Xeon
machine?
> (when I say strtoint from, I did not replace the goto labels, so that part is
> unchanged and unrelated)
I didn't quite follow this.
I've not really studied the fix_COPY_DEFAULT.patch patch. Is there a
reason to delay committing that? It would be good to eliminate that
as a variable for the current performance regression.
David
Attachment | Content-Type | Size |
---|---|---|
pg_strtoint_imul.patch | text/plain | 8.8 KB |
pg_strtoint_optimize.patch | text/plain | 6.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2023-07-25 11:59:42 | Re: logical decoding and replication of sequences, take 2 |
Previous Message | Aleksander Alekseev | 2023-07-25 11:35:31 | Re: [PATCH] Check more invariants during syscache initialization |