Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

From: Pietro Pugni <pietro(dot)pugni(at)gmail(dot)com>
To: didier <did447(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Date: 2015-04-03 13:43:02
Message-ID: 0E6681E8-7412-4A8F-8AEC-FD6A29820B66@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi didier,
thank you for your time.
I forgot to display before the output of free. I’ve looked into it before and I found difficult to fully understand if there was something wrong.

Before starting Postgres:
total used free shared buffers cached
Mem: 125G 9G 115G 15M 362M 8.1G
-/+ buffers/cache: 1.5G 124G
Swap: 127G 0B 127G

Here’s an example of free output when queries B_1 and B_2 are running (they’re part of the same transaction). Generally values remains the same. For what I can understand, RAM isn’t used at all (there’s a lot of unused RAM).

total used free shared buffers cached
Mem: 125G 13G 112G 3.1G 362M 11G
-/+ buffers/cache: 1.9G 123G
Swap: 127G 0B 127G

With Postgres running after transaction has been executed:
total used free shared buffers cached
Mem: 125G 13G 112G 3.1G 362M 11G
-/+ buffers/cache: 1.5G 124G
Swap: 127G 0B 127G

> there's also huge page
> /sys/kernel/mm/transparent_hugepage/enabled
> can you try to disable it?
It was enabled and after disabling it nothing changed: time execution is practically the same (131s for the same transaction tested in previous emails, which is composed by queries B_1 and B_2).

> Also test on the dell:
> select tmp.cf, tmp.dt from grep_studi.tmp;
> and
> select tmp.cf, tmp.dt from grep_studi.tmp order by tmp.cf;
> in Query B_2
> the sort is 9 time slower on the dell, you have to find why…
Here’s the output for the two queries:

> select tmp.cf, tmp.dt from grep_studi.tmp;

"Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.082..618.709 rows=2951191 loops=1)"
" Output: cf, dt"
" Buffers: shared hit=512 read=7802 dirtied=8314"
"Planning time: 0.087 ms"
"Execution time: 745.505 ms"

> select tmp.cf, tmp.dt from grep_studi.tmp;

"Sort (cost=38431.55..39104.99 rows=1346868 width=72) (actual time=3146.548..3306.179 rows=2951191 loops=1)"
" Output: cf, dt"
" Sort Key: tmp.cf"
" Sort Method: quicksort Memory: 328866kB"
" Buffers: shared hit=8317"
" -> Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.012..373.346 rows=2951191 loops=1)"
" Output: cf, dt"
" Buffers: shared hit=8314"
"Planning time: 0.034 ms"
"Execution time: 3459.065 ms"

> 32 GB for buffers is too high for the queries in your test but it
> doesn't matter.

I’ve set shared_buffers to be 1/4 of the total RAM. I’ve changed kernel values to accomodate this value. Lowering to smaller values doesn’t improve the transaction results. Here’s the results with 1 run for each level of shared_buffers:

32GB: 131s
16GB: 132s
8GB: 133s
4GB: 132s
2GB: 143s
1GB: 148s
512MB: 183s
256MB: 192s

Probably I can keep 4GB but I make use of several partitions with tens of millions of records each. This is why I keep shared_buffers high. My applications is also similar to a DWH solution with one user. Like you said, big values of shared_buffers shouldn’t be a issue.

I’ve done some tests with sysbench on Dell T420 and MacMini.

T420 - RAM READ - 16GB / 1MB
sh-4.3# sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1024K

Memory transfer size: 16384M

Memory operations type: read
Memory scope type: global
Threads started!
Done.

Operations performed: 16384 (3643025.32 ops/sec)

16384.00 MB transferred (3643025.32 MB/sec)

Test execution summary:
total time: 0.0045s
total number of events: 16384
total time taken by event execution: 0.0031
per-request statistics:
min: 0.00ms
avg: 0.00ms
max: 0.02ms
approx. 95 percentile: 0.00ms

Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 0.0031/0.00

MacMini - RAM READ - 16GB / 1MB
server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored

Threads started!

Operations performed: 16384 ( 5484.50 ops/sec)

16384.00 MB transferred (5484.50 MB/sec)

General statistics:
total time: 2.9873s
total number of events: 16384
total time taken by event execution: 2.9836s
response time:
min: 0.18ms
avg: 0.18ms
max: 0.24ms
approx. 95 percentile: 0.19ms

Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 2.9836/0.00

T420 - RAM WRITE - 16GB / 1MB
sh-4.3# sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1024K

Memory transfer size: 16384M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 16384 ( 8298.97 ops/sec)

16384.00 MB transferred (8298.97 MB/sec)

Test execution summary:
total time: 1.9742s
total number of events: 16384
total time taken by event execution: 1.9723
per-request statistics:
min: 0.12ms
avg: 0.12ms
max: 0.25ms
approx. 95 percentile: 0.12ms

Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 1.9723/0.00

MacMini - RAM WRITE - 16GB / 1MB
server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored

Threads started!

Operations performed: 16384 ( 5472.90 ops/sec)

16384.00 MB transferred (5472.90 MB/sec)

General statistics:
total time: 2.9937s
total number of events: 16384
total time taken by event execution: 2.9890s
response time:
min: 0.18ms
avg: 0.18ms
max: 0.32ms
approx. 95 percentile: 0.19ms

Threads fairness:
events (avg/stddev): 16384.0000/0.00
execution time (avg/stddev): 2.9890/0.00

T420 - CPU
sh-4.3# sysbench --test=cpu run
sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000

Test execution summary:
total time: 13.0683s
total number of events: 10000
total time taken by event execution: 13.0674
per-request statistics:
min: 1.30ms
avg: 1.31ms
max: 1.44ms
approx. 95 percentile: 1.35ms

Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 13.0674/0.00

MacMini - CPU
server:sysbench Pietro$ ./sysbench --test=cpu run
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored

Primer numbers limit: 10000

Threads started!

General statistics:
total time: 11.5728s
total number of events: 10000
total time taken by event execution: 11.5703s
response time:
min: 1.15ms
avg: 1.16ms
max: 2.17ms
approx. 95 percentile: 1.17ms

Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 11.5703/0.00

I’ve done these tests because someone else on this discussion asked me to investigate on memory bandwidth and because I found this interesting article about Intel Xeon vs Intel i5 with different Postgres versions: http://blog.pgaddict.com/posts/performance-since-postgresql-7-4-to-9-4-pgbench
Hope this helps to better understand the problem.

Thank you very much.
Best regards,
Pietro

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2015-04-03 13:46:45 Re: PATCH: adaptive ndistinct estimator v4
Previous Message didier 2015-04-02 15:11:41 Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL