Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> wrote:
> Concurrency 8 results should probably be ignored - variance was
> huge, definitely more than the differences.
I'm not so sure it should be ignored -- one thing I noticed in
looking at the raw numbers from my benchmarks was that the -O2 code
was much more consistent from run to run than the -O3 code. I doubt
that the more aggressive optimizations were developed under NUMA
architecture, and I suspect that the aggressively optimized code may
be more sensitive to whether memory is directly accessed by the core
running the process or routed though the memory controller on
another core. (I hit on this idea this morning when I remembered
seeing similar variations in run times of STREAM against our new
servers with NUMA.)
This suggests that in the long term, it might be worth investigating
whether we can arrange for a connection's process to have some
degree of core affinity and encourage each process to allocate local
memory from RAM controlled by that core. To some extent I would
expect the process-based architecture of PostgreSQL to help with
that, as you would expect a NUMA-aware OS to try to arrange that to
some degree.
-Kevin