Re: Do we need to handle orphaned prepared transactions in the server?

From: Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Kellerer <shammat(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Do we need to handle orphaned prepared transactions in the server?
Date: 2020-03-02 12:42:11
Message-ID: CANugjhsA-1ORGbV7mGW-dBLOK=2QJBTDqj5wkH=6-wB5sKj7XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is the v2 of the same patch after rebasing it and running it through
pgindent. There are no other code changes.

On Wed, Feb 19, 2020 at 8:04 PM Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com> wrote:

> All,
>
> Attached is version 1 of POC patch for notifying of orphaned
> prepared transactions via warnings emitted to a client
> application and/or log file. It applies to PostgreSQL branch
> "master" on top of "e2e02191" commit.
>
> I've tried to keep the patch as less invasive as I could with
> minimal impact on vacuum processes, so the performance impact
> and the changes are minimal in that area of PostgreSQL core.
>
>
> - What's in this Patch:
>
> This patch throws warnings when an autovacuum worker encounters
> an orphaned prepared transaction. It also throws warnings to a
> client when a vacuum command is issued. This patch also
> introduces two new GUCs:
>
> (1) max_age_prepared_xacts
> - The age after creation of a prepared transaction after which it
> will be considered an orphan.
>
> (2) prepared_xacts_vacuum_warn_timeout
> - The timeout period for an autovacuum (essentially any of its
> worker) to check for orphaned prepared transactions and throw
> warnings if any are found.
>
>
> - What This Patch Does:
>
> If the GUCs are enabled (set to a value higher than -1), an
> autovacuum worker running in the background checks if the
> timeout has expired. If so, it checks if there are any orphaned
> prepared transactions (i.e. their age has exceeded
> max_age_prepared_xacts). If it finds any, it throws a warning for
> every such transaction. It also emits the total number of orphaned
> prepared transactions if one or more are found.
>
> When a vacuum command is issued from within a client, say psql,
> in that case, we skip the vacuum timeout check and simply scan
> for any orphaned prepared transactions. Warnings are emitted to
> the client and log file if any are found.
>
>
> - About the New GUCs:
>
> = max_age_prepared_xacts:
> Sets maximum age after which a prepared transaction is considered an
> orphan. It applies when "prepared transactions" are enabled. The
> age for a transaction is calculated from the time it was created to
> the current time. If this value is specified without units, it is taken
> as milliseconds. The default value is -1 which allows prepared
> transactions to live forever.
>
> = prepared_xacts_vacuum_warn_timeout:
> Sets timeout after which vacuum starts throwing warnings for every
> prepared transactions that has exceeded maximum age defined by
> "max_age_prepared_xacts". If this value is specified without units,
> it is taken as milliseconds. The default value of -1 will disable
> this warning mechanism. Setting a too value could potentially fill
> up log with orphaned prepared transaction warnings, so this
> parameter must be set to a value that is reasonably large to not
> fill up log file, but small enough to notify of long running and
> potential orphaned prepared transactions. There is no additional
> timer or worker introduced with this change. Whenever a vacuum
> worker runs, it first checks for any orphaned prepared transactions.
> So at best, this GUC serves as a guideline for a vacuum worker
> if a warning should be thrown to log file or a client issuing
> vacuum command.
>
>
> - What this Patch Does Not Cover:
>
> The warning is not thrown when user either runs vacuumdb or passes
> individual relations to be vacuum. Specifically in case of vacuumdb,
> it breaks down a vacuum command to an attribute-wise vacuum command.
> So the vacuum command is indirectly run many times. Considering that
> we want to emit warnings for every manual vacuum command, this simply
> floods the terminal and log with orphaned prepared transactions
> warnings. We could potentially handle that, but the overhead of
> that seemed too much to me (and I've not invested any time looking
> to fix that either). Hence, warnings are not thrown when user runs
> vacuumdb and relation specific vacuum.
>
>
>
> On Fri, Jan 31, 2020 at 7:02 PM Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>
> wrote:
>
>>
>>
>> On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer <craig(at)2ndquadrant(dot)com>
>> wrote:
>>
>>> On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>
>>> wrote:
>>> >
>>> > So having seen the feedback on this thread, and I tend to agree with
>>> most of what has been said here, I also agree that the server core isn't
>>> really the ideal place to handle the orphan prepared transactions.
>>> >
>>> > Ideally, these must be handled by a transaction manager, however, I do
>>> believe that we cannot let database suffer for failing of an external
>>> software, and we did a similar change through introduction of idle in
>>> transaction timeout behavior.
>>>
>>> The difference, IMO, is that idle-in-transaction aborts don't affect
>>> anything we've promised to be durable.
>>>
>>> Once you PREPARE TRANSACTION the DB has made a promise that that txn
>>> is durable. We don't have any consistent feedback channel to back to
>>> applications and say "Hey, if you're not going to finish this up we
>>> need to get rid of it soon, ok?". If a distributed transaction manager
>>> gets consensus for commit and goes to COMMIT PREPARED a previously
>>> prepared txn only to find that it has vanished, that's a major
>>> problem, and one that may bring the entire DTM to a halt until the
>>> admin can intervene.
>>>
>>> This isn't like idle-in-transaction aborts. It's closer to something
>>> like uncommitting a previously committed transaction.
>>>
>>> I do think it'd make sense to ensure that the documentation clearly
>>> highlights the impact of abandoned prepared xacts on server resource
>>> retention and performance, preferably with pointers to appropriate
>>> views. I haven't reviewed the docs to see how clear that is already.
>>>
>>
>> Having seen the documentation, IMHO the document does contain enough
>> information for users to understand what issues can be caused by these
>> orphaned prepared transactions.
>>
>>
>>>
>>> I can also see an argument for a periodic log message (maybe from
>>> vacuum?) warning when old prepared xacts hold xmin down. Including one
>>> sent to the client application when an explicit VACUUM is executed.
>>> (In fact, it'd make sense to generalise that for all xmin-retention).
>>>
>>
>> I think that opens up the debate on what we really mean by "old" and
>> whether that requires a syntax change when creating a prepared
>> transactions as Thomas Kellerer suggested earlier?
>>
>> I agree that vacuum should periodically throw warnings for any prepared
>> xacts that are holding xmin down.
>>
>> Generalising it for all xmin-retention is a fair idea IMHO, though that
>> does increase the overall scope here. A vacuum process should (ideally)
>> periodically throw out warnings for anything that is preventing it
>> (including
>> orphaned prepared transactions) from doing its routine work so that
>> somebody can take necessary actions.
>>
>>
>>> But I'm really not a fan of aborting such txns. If you operate with
>>> some kind of broken global transaction manager that can forget or
>>> abandon prepared xacts, then fix it, or adopt site-local periodic
>>> cleanup tasks that understand your site's needs.
>>>
>>> --
>>> Craig Ringer http://www.2ndQuadrant.com/
>>> 2ndQuadrant - PostgreSQL Solutions for the Enterprise
>>>
>>
>>
>> --
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>> ADDR: 10318 WHALLEY BLVD, Surrey, BC
>> CELL:+923335449950 EMAIL: mailto:hamid(dot)akhtar(at)highgo(dot)ca
>> SKYPE: engineeredvirus
>>
>
>
> --
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> CELL:+923335449950 EMAIL: mailto:hamid(dot)akhtar(at)highgo(dot)ca
> SKYPE: engineeredvirus
>

--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid(dot)akhtar(at)highgo(dot)ca
SKYPE: engineeredvirus

Attachment Content-Type Size
orphaned_pxact.v2.patch application/octet-stream 12.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-03-02 13:07:14 Re: Minor issues in .pgpass
Previous Message Daniel Gustafsson 2020-03-02 12:32:57 Re: Change atoi to strtol in same place