Is it possible to make ReorderBufferRestoreCleanup faster?

From: Evgeny Kuzin <evgeny(dot)kuzin(at)outlook(dot)com>
To: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Is it possible to make ReorderBufferRestoreCleanup faster?
Date: 2024-12-17 18:01:21
Message-ID: AM6P193MB031068045A62599D3F53E19C97042@AM6P193MB0310.EURP193.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

We had an issue with Postgres logical replication after 8h transaction which rogue user run at night.
All logical replication threads were 100% cpu bound and stuck in this:

unlink("pg_replslot/data/xid-1719052643-lsn-4854E-FE000000.spill") = -1 ENOENT (No such file or directory) <0.000008>
unlink("pg_replslot/data/xid-1719052643-lsn-4854E-FF000000.spill") = -1 ENOENT (No such file or directory) <0.000008>
unlink("pg_replslot/data/xid-1719052643-lsn-4854F-0.spill") = -1 ENOENT (No such file or directory) <0.000010>
unlink("pg_replslot/data/xid-1719052643-lsn-4854F-1000000.spill") = -1 ENOENT (No such file or directory) <0.000008>
unlink("pg_replslot/data/xid-1719052643-lsn-4854F-2000000.spill") = -1 ENOENT (No such file or directory) <0.000008>

After stopping publisher, which wasnt easy too - we had to change dir rights, so it will crash, increasing logical replication memory to a few hundreds GB, chown dir bask to postgres and start it back it went through in 20 min top, but before with 30GB committed to logical replication the speed was so slow - basic math was telling us that we will finish this replication lag in 2 weeks.

There were hundreds of thousands of files in this dir. But EOENT was much more than existing files.
Maybe ReorderBufferRestoreCleanup can be optimized somehow? Run in parallel?

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomasz Szypowski 2024-12-17 20:29:20 Not able to restore database - error: could not decompress data: Allocation error : not enough memory
Previous Message Tom Lane 2024-12-17 15:31:50 Re: to_timestamp function calculation error