From: | "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com> |
---|---|
To: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | AJG <ayden(at)gera(dot)co(dot)nz> |
Subject: | RE: Global shared meta cache |
Date: | 2018-07-13 07:03:43 |
Message-ID: | 4E72940DA2BF16479384A86D54D0988A6F14E48C@G01JPEXMBKW04 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Konstantin
>Hi,
>I really think that we need to move to global caches (and especially catalog caches) in
>Postgres.
>Modern NUMA servers may have hundreds of cores and to be able to utilize all of them,
>we may need to start large number (hundreds) of backends.
>Memory overhead of local cache multiplied by 1000 can be quite significant.
Yeah, thank you for the comment.
>I am quite skeptical concerning performance results you have provided.
>Once dataset completely fits in memory (which is true in your case), select-only
>pgbench with prepared statements should be about two times faster, than without
>prepared statements. And in your case performance with prepared statements is even
>worser.
>
>I wonder if you have repeated each measurement multiple time, to make sure that it
>is not just a fluctuation.
>Also which postgresql configuration you have used. If it is default postgresql.conf with
>128Mb shared buffers size, then you are measuring time of disk access and catalog
>cache is not relevant for performance in this case.
>
>Below are result I got with pgbench scale 100 (with scale 10 results are slightly better)
>at my desktop with just 16Gb of RAM and 4 ccore.:
>
> |master branch | prototype | proto/master
>(%)
> ------------------------------------------------------------------------------------
> pgbench -c10 -T60 -Msimple -S | 187189 |182123 |97%
> pgbench -c10 -T60 -Msimple | 15495 |15112 |97%
> pgbench -c10 -T60 -Mprepared -S | 98273 |92810 |94%
> pgbench -c10 -T60 -Mprepared | 25796 |25169 |97%
>
>As you see there are no surprises here: negative effect of shared cache is the largest
>for the case of non-prepared selects (because selects themselves are much faster
>than updates and during compilation we have to access relations multiple times).
>
As you pointed out my shared_memory and scaling factor was too small.
I did the benchmark again with a new setting and my result seems to reproduce your result.
On the machine with 128GB memory and 16 cores, shared_buffer was set to 32GB and
db was initialized with -s100.
TPS result follows: (mean of 10 times measurement; round off the decimal)
|master branch | proto | proto/master (%)
------------------------------------------------------------------------------------
pgbench -c48 -T60 -j16 -Msimple -S |122140 | 114103 | 93
pgbench -c48 -T60 -j16 -Msimple | 7858 | 7822 | 100
pgbench -c48 -T60 -j16 -Mprepared -S |221740 | 210778 | 95
pgbench -c48 -T60 -j16 -Mprepared | 9257 | 8998 | 97
As you mentioned, SELECT only query has more overheads.
( By the way, I think in the later email you mentioned about the result when the concurrent number of clients is larger.
On this point I'll also try to check the result.)
====================
Takeshi Ideriha
Fujitsu Limited
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Travers | 2018-07-13 07:07:16 | Re: How can we submit code patches that implement our (pending) patents? |
Previous Message | Tsunakawa, Takayuki | 2018-07-13 06:53:05 | RE: How to make partitioning scale better for larger numbers of partitions |