From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz> |
Subject: | Re: slab allocator performance issues |
Date: | 2022-12-14 10:37:52 |
Message-ID: | CAFBsxsEby=vzxX31Rc5-XjkgXFs2UygY7OAHr-Az600NcgSR9A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 13, 2022 at 7:50 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> Thanks for testing the patch.
>
> On Mon, 12 Dec 2022 at 20:14, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
> > While allocation is markedly improved, freeing looks worse here. The
proportion is surprising because only about 2% of nodes are freed during
the load, but doing that takes up 10-40% of the time compared to allocating.
>
> I've tried to reproduce this with the v13 patches applied and I'm not
> really getting the same as you are. To run the function 100 times I
> used:
>
> select x, a.* from generate_series(1,100) x(x), lateral (select * from
> bench_load_random_int(500 * 1000 * (1+x-x))) a;
Simply running over a longer period of time like this makes the SlabFree
difference much closer to your results, so it doesn't seem out of line
anymore. Here SlabAlloc seems to take maybe 2/3 of the time of current
slab, with a 5% reduction in total time:
500k ints:
v13-0001-0005
average of 30: 217ms
47.61% postgres postgres [.] rt_set
20.99% postgres postgres [.] SlabAlloc
10.00% postgres postgres [.] rt_node_insert_inner.isra.0
6.87% postgres [unknown] [k] 0xffffffffbce011b7
3.53% postgres postgres [.] MemoryContextAlloc
2.82% postgres postgres [.] SlabFree
+slab v4
average of 30: 206ms
51.13% postgres postgres [.] rt_set
14.08% postgres postgres [.] SlabAlloc
11.41% postgres postgres [.] rt_node_insert_inner.isra.0
7.44% postgres [unknown] [k] 0xffffffffbce011b7
3.89% postgres postgres [.] MemoryContextAlloc
3.39% postgres postgres [.] SlabFree
It doesn't look mysterious anymore, but I went ahead and took some more
perf measurements, including for cache misses. My naive impression is that
we're spending a bit more time waiting for data, but having to do less work
with it once we get it, which is consistent with your earlier comments:
perf stat -p $pid sleep 2
v13:
2,001.55 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
311,690 page-faults:u # 155.724 K/sec
3,128,740,701 cycles:u # 1.563 GHz
4,739,333,861 instructions:u # 1.51 insn
per cycle
820,014,588 branches:u # 409.690 M/sec
7,385,923 branch-misses:u # 0.90% of all
branches
+slab v4:
2,001.09 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
326,017 page-faults:u # 162.920 K/sec
3,016,668,818 cycles:u # 1.508 GHz
4,324,863,908 instructions:u # 1.43 insn
per cycle
761,839,927 branches:u # 380.712 M/sec
7,718,366 branch-misses:u # 1.01% of all
branches
perf stat -e LLC-loads,LLC-loads-misses -p $pid sleep 2
min/max of 3 runs:
v13: LL cache misses: 25.08% - 25.41%
+slab v4: LL cache misses: 25.74% - 26.01%
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2022-12-14 10:46:17 | RE: Time delayed LR (WAS Re: logical replication restrictions) |
Previous Message | vignesh C | 2022-12-14 10:34:44 | Re: Support logical replication of DDLs |