Re: Logical decoding CPU-bound w/ large number of tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mathieu Fenniak <mathieu(dot)fenniak(at)replicon(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Logical decoding CPU-bound w/ large number of tables
Date: 2017-05-06 00:59:09
Message-ID: 18109.1494032349@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Mathieu Fenniak <mathieu(dot)fenniak(at)replicon(dot)com> writes:
> I'm attempting to use logical decoding with the streaming replication
> protocol to perform change-data-capture on PostgreSQL 9.5.4. I'm seeing
> the replication stream "stall" for long periods of time where the walsender
> process will be pinned at 100% CPU utilization, but no data is being sent
> to my client.

> I've performed a CPU sampling with the OSX `sample` tool based upon
> reproduction approach #1:
> https://gist.github.com/mfenniak/366d7ed19b2d804f41180572dc1600d8
> It appears that most of the time is spent in the
> RelfilenodeMapInvalidateCallback and CatalogCacheIdInvalidate cache
> invalidation callbacks, both of which appear to be invalidating caches
> based upon the cache value.

Hmm ... as for RelfilenodeMapInvalidateCallback, the lack of calls to
hash_search() from it in your trace says that it usually isn't doing
anything useful. All the time is being spent in hash_seq_search,
uselessly iterating over the hashtable. I'm inclined to think that
we need a smarter data structure there, maybe an independent hashtable
tracking the reverse map from relation OID to filenode map entry.

As for CatalogCacheIdInvalidate, I wonder how many of those cycles
are doing something useful, and how many are being wasted in the outer
loop that just iterates over the cache list. We could trivially get
rid of that outer search by using syscache.c's array, as in the
attached patch. It'd be interesting to see if this patch helps your
scenario #1. (Patch is against HEAD but seems to apply cleanly to 9.5)

Most likely, your scenario #2 is completely stuck on the
RelfilenodeMapInvalidateCallback issue, though it would be good
to get a trace to confirm that.

regards, tom lane

Attachment Content-Type Size
avoid-search-in-catcache-invalidate.patch text/x-diff 8.0 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2017-05-06 01:14:51 Re: PG96 pg_restore connecting to PG95 causes ERROR: unrecognized configuration parameter "idle_in_transaction_session_timeout"
Previous Message Justin Pryzby 2017-05-06 00:22:53 PG96 pg_restore connecting to PG95 causes ERROR: unrecognized configuration parameter "idle_in_transaction_session_timeout"