Re: Logical replication - DDL sub transactions for script executed in single transaction?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication - DDL sub transactions for script executed in single transaction?
Date: 2019-03-22 10:43:54
Message-ID: CAFj8pRCYgXdkxv1Ha7=-cwEPgnE9UfAaoLXWcUXrP7USw911rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

pá 22. 3. 2019 v 10:49 odesílatel Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com>
napsal:

> Hello,
>
> I've learned that logical replication might have performance problem if
> there are lot of sub transactions within transaction (at least because it
> enforces spill files in pg_replslot and if there are many - like 80mio,
> EXT4 did not perform well - in my case it was caused bu misuse of
> exception handling in trigger function) and once we get rid of this usage
> pattern logical replication works like a charm.
> Until we have applied schema migration patch creating 700+ tables by
> flyway, therefore whole file was applied in single transaction (like psql
> -1 -f xxxx) and it effectively blocked our replication for 22 hours. No
> failures, wal sender processed did not consumed huge amount of memory (what
> happened in past wiht 80mi sub-trans caused by trigger), all wal senders
> did IO very rarely, most of the time they dwell on CPU.
>
> The spill files were not so big:
> # du -sh xid-924361123-lsn-FBC-C7000000.snap
> 224K xid-924361123-lsn-FBC-C7000000.snap
>
> It takes about 1 minute (all the time 100% CPU used by was sender) to move
> to following file.
>
> Version, OS:
> PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu,
> compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
>
> Perf:
> Overhead Command Shared Object Symbol
> 57.79% postgres postgres [.] hash_seq_search
> 14.70% postgres postgres [.] hash_search_with_hash_value
> 5.93% postgres postgres [.] LocalExecuteInvalidationMessage
> 3.60% postgres postgres [.] RelfilenodeMapInvalidateCallback
> 2.99% postgres postgres [.] hash_uint32
> 2.50% postgres postgres [.] hash_search
> 1.91% postgres postgres [.] CatCacheInvalidate
> 1.83% postgres postgres [.] CallSyscacheCallbacks
> 1.37% postgres postgres [.] hash_seq_init
> 1.24% postgres postgres [.] hash_seq_term
> 1.02% postgres postgres [.] RelationCacheInvalidateEntry
> 0.97% postgres postgres [.] ReorderBufferCommit
> 0.97% postgres pgoutput.so [.] rel_sync_cache_relation_cb
> 0.78% postgres postgres [.] uint32_hash
> 0.73% postgres postgres [.] PlanCacheRelCallback
> 0.60% postgres postgres [.] InvalidateCatalogSnapshot
> 0.58% postgres postgres [.] SysCacheInvalidate
> 0.35% postgres postgres [.] GetCurrentTransactionNestLevel
> 0.13% postgres pgoutput.so [.] _init
> 0.00% postgres [kernel.kallsyms] [k] __do_softirq
> 0.00% postgres [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> 0.00% postgres postgres [.] ResourceOwnerEnlargeRelationRefs
>
> My suspect is that psql (of flyway) while processing a file with many
> create table and grant statements in single transaction mode might create
> sub transactions, but I haven't found it in docs.
>

Maybe you have disabled autocommit with ON_ERROR_ROLLBACK on

I don't know about another case, where psql can use subtransactions.

Pavel

>
> Does you have similar experience? What is root cause for this issue?
>
> Workaround is clear - do not create that large amount of DDL in single
> transaction, but it would be nice to know root cause to eliminate
> scripts/usage patterns causing this issue generally.
>
> On the other hand, following transaction (100GB of spill files) , almost
> 150 mio rows deleted (and archived by statement trigger) behaves expected
> way - wal sender process was reading from disk (something like 130MB/s) and
> after ~20 minutes replication was in sync, all spill files were removed, so
> transaction changes volume itself does not seems to be any issue.
>
> Thanks for any advice!
>
> Kind regards Ales Zeleny
> P.S.: PostgreSQL log facility is superb, having DDL statements logged and
> XID in log line prefix I was able to identify the issue source in this case.
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Christian Henz 2019-03-22 10:56:34 Forks of pgadmin3?
Previous Message Thomas Güttler 2019-03-22 10:23:54 Script which shows performance of ByteA: ascii vs binary