From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Why is pq_begintypsend so slow? |
Date: | 2020-01-11 19:04:51 |
Message-ID: | CAMkU=1whbRDUwa4eayD9+59K-coxO9senDkPRbTn3cg0pUz4AQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I was using COPY recently and was wondering why BINARY format is not much
(if any) faster than the default format. Once I switched from mostly
exporting ints to mostly exporting double precisions (7e6 rows of 100
columns, randomly generated), it was faster, but not by as much as I
intuitively thought it should be.
Running 'perf top' to profile a "COPY BINARY .. TO '/dev/null'" on a AWS
m5.large machine running Ubuntu 18.04, with self compiled PostgreSQL:
PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
I saw that the hotspot was pq_begintypsend at 20%, which was twice the
percentage as the next place winner (AllocSetAlloc). If I drill down into
teh function, I see something like the below. I don't really speak
assembly, but usually when I see an assembly instruction being especially
hot and not being the inner most instruction in a loop, I blame it on CPU
cache misses. But everything being touched here should already be well
cached, since initStringInfo has just got done setting it up. And if not
for that, then the by the 2nd invocation of appendStringInfoCharMacro it
certainly should be in the cache, yet that one is even slower than the 1st
appendStringInfoCharMacro.
Why is this such a bottleneck?
pq_begintypsend /usr/local/pgsql/bin/postgres
0.15 | push %rbx
0.09 | mov %rdi,%rbx
| initStringInfo(buf);
3.03 | callq initStringInfo
| /* Reserve four bytes for the bytea length word */
| appendStringInfoCharMacro(buf, '\0');
| movslq 0x8(%rbx),%rax
1.05 | lea 0x1(%rax),%edx
0.72 | cmp 0xc(%rbx),%edx
| jge b0
2.92 | mov (%rbx),%rdx
| movb $0x0,(%rdx,%rax,1)
13.76 | mov 0x8(%rbx),%eax
0.81 | mov (%rbx),%rdx
0.52 | add $0x1,%eax
0.12 | mov %eax,0x8(%rbx)
2.85 | cltq
0.01 | movb $0x0,(%rdx,%rax,1)
| appendStringInfoCharMacro(buf, '\0');
10.65 | movslq 0x8(%rbx),%rax
| lea 0x1(%rax),%edx
0.90 | cmp 0xc(%rbx),%edx
| jge ca
0.54 | 42: mov (%rbx),%rdx
1.84 | movb $0x0,(%rdx,%rax,1)
13.88 | mov 0x8(%rbx),%eax
0.03 | mov (%rbx),%rdx
| add $0x1,%eax
0.33 | mov %eax,0x8(%rbx)
2.60 | cltq
0.06 | movb $0x0,(%rdx,%rax,1)
| appendStringInfoCharMacro(buf, '\0');
3.21 | movslq 0x8(%rbx),%rax
0.23 | lea 0x1(%rax),%edx
1.74 | cmp 0xc(%rbx),%edx
| jge e0
0.21 | 67: mov (%rbx),%rdx
1.18 | movb $0x0,(%rdx,%rax,1)
9.29 | mov 0x8(%rbx),%eax
0.18 | mov (%rbx),%rdx
| add $0x1,%eax
0.19 | mov %eax,0x8(%rbx)
3.14 | cltq
0.12 | movb $0x0,(%rdx,%rax,1)
| appendStringInfoCharMacro(buf, '\0');
5.29 | movslq 0x8(%rbx),%rax
0.03 | lea 0x1(%rax),%edx
1.45 | cmp 0xc(%rbx),%edx
| jge f6
0.41 | 8c: mov (%rbx),%rdx
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-01-11 20:19:37 | Re: Why is pq_begintypsend so slow? |
Previous Message | David Fetter | 2020-01-11 17:53:36 | Re: Make autovacuum sort tables in descending order of xid_age |