Re: Logical decoding CPU-bound w/ large number of tables

From: Andres Freund <andres(at)anarazel(dot)de>
To: Mathieu Fenniak <mathieu(dot)fenniak(at)replicon(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Logical decoding CPU-bound w/ large number of tables
Date: 2017-05-06 01:20:55
Message-ID: 20170506012055.hhmx5hirj2yk7d3g@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

On 2017-05-05 14:24:07 -0600, Mathieu Fenniak wrote:
> The stalls occur unpredictably on my production system, but generally seem
> to be correlated with schema operations. My source database has about
> 100,000 tables; it's a one-schema-per-tenant multi-tenant SaaS system.

I'm unfortunately not entirely surprised you're seeing some issues in
that case. We're invalidating internal caches a bit bit
overjudiciously, and that invalidation is triggered by schema changes.

> I've performed a CPU sampling with the OSX `sample` tool based upon
> reproduction approach #1:
> https://gist.github.com/mfenniak/366d7ed19b2d804f41180572dc1600d8 It
> appears that most of the time is spent in the
> RelfilenodeMapInvalidateCallback and CatalogCacheIdInvalidate cache
> invalidation callbacks, both of which appear to be invalidating caches
> based upon the cache value.

I think optimizing those has some value (and I see Tom is looking at
that aspect, but the bigger thing would probably be to do fewer lookups.

> Has anyone else run into this kind of performance problem? Any thoughts on
> how it might be resolved? I don't mind putting in the work if someone
> could describe what is happening here, and have a discussion with me about
> what kind of changes might be necessary to improve the performance.

If you could provide an easily runnable sql script that reproduces the
issue, I'll have a look. I think I have a rough idea what to do.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2017-05-06 01:27:03 Re: Logical decoding CPU-bound w/ large number of tables
Previous Message Tom Lane 2017-05-06 01:14:51 Re: PG96 pg_restore connecting to PG95 causes ERROR: unrecognized configuration parameter "idle_in_transaction_session_timeout"