Re: BUG #14231: logical replication wal sender process spins when using error traps in function

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: blake(at)rcmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14231: logical replication wal sender process spins when using error traps in function
Date: 2016-07-06 21:00:46
Message-ID: 878txea5b4.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>>>>> "blake" == blake <blake(at)rcmail(dot)com> writes:

blake> Connect a logical client with test_decoding plugin and run the
blake> code below. It will cause the replication process to spin,
blake> using 100% CPU for some long period of time.

So what I've found in my analysis so far since you mentioned this on IRC
is:

1. The time is being spent in ReorderBufferCleanupTXN and the functions
it calls. This is called once for the transaction and (recursively) once
per subtransaction.

2. Within each of those calls, the main culprit seems to be pathological
behavior of the retail pfree() calls of allocated memory. The loop in
AllocSetFree which chases down the allocated block list (for chunks over
the chunk limit) is being executed nearly 900 million times for what
seems to be about 42 thousand calls.

A quick scan of the code suggests that the worst case is when blocks are
being freed in FIFO order, which seems quite plausible in this case, and
the performance is potentially O(N^2).

So I think this is primarily an artifact of doing so much retail
palloc/pfree in a single memory context.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2016-07-06 21:23:33 Re: BUG #14231: logical replication wal sender process spins when using error traps in function
Previous Message Martin Angelovski 2016-07-06 19:35:15 pgAdmin3 1.22.1 crashes constantly on Mac os 10.11.5