Quick Links

Re: Oh, this is embarrassing: init file logic is still broken

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Oh, this is embarrassing: init file logic is still broken
Date:	2015-06-24 21:52:48
Message-ID:	558B26B0.1070704@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 06/23/2015 04:44 PM, Tom Lane wrote:
> Chasing a problem identified by my Salesforce colleagues led me to the
> conclusion that my commit f3b5565dd ("Use a safer method for determining
> whether relcache init file is stale") is rather borked. It causes
> pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file,
> because that index is not used by any syscache. I had been aware of that
> actually, but considered it a minor issue. It's not so minor though,
> because RelationCacheInitializePhase3 marks that index as nailed for
> performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES.
> That means that load_relcache_init_file *always* decides that the init
> file is busted and silently(!) ignores it. So we're taking a nontrivial
> hit in backend startup speed as of the last set of minor releases.

OK, this is pretty bad in its real performance effects. On a workload
which is dominated by new connection creation, we've lost about 17%
throughput.

To test it, I ran pgbench -s 100 -j 2 -c 6 -r -C -S -T 1200 against a
database which fits in shared_buffers on two different m3.large
instances on AWS (across the network, not on unix sockets). A typical
run on 9.3.6 looks like this:

scaling factor: 100
query mode: simple
number of clients: 6
number of threads: 2
duration: 1200 s
number of transactions actually processed: 252322
tps = 210.267219 (including connections establishing)
tps = 31958.233736 (excluding connections establishing)
statement latencies in milliseconds:
0.002515 \set naccounts 100000 * :scale
0.000963 \setrandom aid 1 :naccounts
19.042859 SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;

Whereas a typical run on 9.3.9 looks like this:

scaling factor: 100
query mode: simple
number of clients: 6
number of threads: 2
duration: 1200 s
number of transactions actually processed: 208180
tps = 173.482259 (including connections establishing)
tps = 31092.866153 (excluding connections establishing)
statement latencies in milliseconds:
0.002518 \set naccounts 100000 * :scale
0.000988 \setrandom aid 1 :naccounts
23.076961 SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;

Numbers are pretty consistent on four runs each on two different
instances (+/- 4%), so I don't think this is Amazon variability we're
seeing. I think the syscache invalidation is really costing us 17%. :-(

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Oh, this is embarrassing: init file logic is still broken at 2015-06-23 23:44:36 from Tom Lane

Responses

Re: Oh, this is embarrassing: init file logic is still broken at 2015-06-25 08:20:25 from Tatsuo Ishii
Re: Oh, this is embarrassing: init file logic is still broken at 2015-06-25 17:47:09 from Peter Geoghegan
Re: Oh, this is embarrassing: init file logic is still broken at 2015-06-29 15:39:56 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2015-06-24 21:53:18	Are we sufficiently clear that jsonb containment is nested?
Previous Message	Robert Haas	2015-06-24 21:20:31	Re: Should we back-patch SSL renegotiation fixes?