From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | chris(dot)travers(at)adjust(dot)com |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Two proposed modifications to the PostgreSQL FDW |
Date: | 2018-08-22 09:29:44 |
Message-ID: | CAD21AoATFZ8+n5338ndu4HP1m8O43RooCyS87GhMJFgYE5U74g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 22, 2018 at 5:13 PM Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
>
>
>
> On Wed, Aug 22, 2018 at 9:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>> On Wed, Aug 22, 2018 at 1:20 PM Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
>> > The two things I would suggest is that rather than auto-detecting (if I understand your patch correctly) whether prepared transactions are possible on the other system, making it an option to the foreign server or foreign table. Otherwise one might enable prepared transactions for one set of operations on one database and have it automatically cause headaches in another context.
>>
>> Yeah, currently it's an option for foreign servers. The patch adds a
>> new option "two_phase_commit" to postgres_fdw.
>
>
> Seems sane.
>>
>>
>> >
>> > The other thing I wasn't quite sure about on your patch was what happens if, say, someone trips over a power cord while the background worker is trying to commit things, whether the information is available on the initiating server when it comes back. whether a DBA has to go out and try to figure out what needs to be committed remotely, and how this would be done. If you could explain that process, that would be helpful to me.
>> >
>> > (In my approach these would be recorded on the master and an SQL function could re-initiate the background worker.)
>>
>> My approach is almost the same as yours. For details, in the
>> pre-commit we do WAL-logging for each participants server before
>> preparing transactions on the remote sides. The WAL has information of
>> participants foreign servers(foreign server oid, database oid etc) and
>> its global transaction identifier. Even if plug-pulled during trying
>> to commit we can recover the global transactions that are not
>> completed yet and its participants information from WAL. After the
>> recovery users needs to execute the SQL function to fix the
>> in-completed global transactions. Since the function can find out
>> whether the remote transaction should be committed or rollback-ed by
>> checking CLOG. Does my answer make sense?
>
>
> Yeah. That is probably more elegant than my solution. I do wonder though if the next phase would not be to add some sort of hook to automatically start the background worker in this case.
>>
>>
>> >>
>> >>
>> >> >
>> >> > Moreover since COMMIT PREPARED occurs during the commit hook, not the precommit hook, it is too late to roll back the local transaction. We cannot raise errors since this causes a conflict in the commit status of the local transaction. So when we commit the local transaction we commit to committing all prepared transactions as soon as possible. Note some changes need to be made to make this usable in the FDW context, so what I am hoping is that the dialog helps impact the discussion and options going forward.
>> >> >
>> >> >>
>> >> >> Also
>> >> >> since we don't want to wait for COMMIT PREPARED to complete we need to
>> >> >> consider that users could cancel the query anytime. To not break the
>> >> >> current semantics we cannot raise error during 2nd phase of two-phase
>> >> >> commit but it's not realistic because even the palloc() can raise an
>> >> >> error.
>> >> >
>> >> >
>> >> > We don't palloc. All memory used here is on the stack. I do allow for dramatic precondition checks to cause errors but those should never happen absent some other programmer doing something dramatically unsafe anyway. For example if you try to double-commit a transaction set.....
>> >>
>> >> Sorry, palloc() is just an example. I'm not sure all FDWs can
>> >> implement all callbacks for two-phase commit without codes that could
>> >> emit errors.
>> >
>> >
>> > Yeah, but if you are in the commit hook and someone emits an error, that's wrong because that then tries to rollback an already committed transaction and the backend rightfully panics. In fact I should probably strip out the precondition check errors there and issue a warning. It might sometimes happen when something goes seriously wrong on a system level, but....
>>
>> In my patch since the commit hook is performed by the background
>> worker not by the backends it's no problem if someone emits an error
>> in the commit hook. After the backend prepared transactions on the all
>> remote side, it enqueue itself to the wait queue. The background
>> worker gets the global transaction waiting to be completed and commit
>> prepared transaction on all remote side. After completed the global
>> transaction the background worker dequeue it.
>
>
> Seems sane. I was firing off one background worker per global transaction that needed cleanup. This might be an area to think about in terms of questions of larger parallelism in remote writes.
>>
>>
>> >>
>> >>
>> >> >
>> >> > There is a possible of system errors if one can no longer write to the file log but at this point as long as we have logged the phase change to commit we are able to recover later.
>> >> >
>> >> > So in the event where one sees an error here one continues on to the next transaction in the global transaction set and tries to commit it, etc. until it runs through the entire set of prepared transactions. Then if there were any errors it fires off a background worker which re-reads the log file and goes out to the various foreign servers, checks to see if there is a prepared transaction, and if so commits it. If the transaction set state was in rollback, it tries to roll it back instead. If this errors,, it sleeps for a second and then loops through those which errored and retries until all are complete.
>> >>
>> >> Yeah, the patch has the similar functionality.
>> >>
>> >> > The other thing is we record whether we are committing or rolling back the transaction when we hit the commit or rollback hook. This is critical because we can imagine a world where the Oracle FDW supports similar semantics. In that case everything works and is not ordering dependent. I.e. we can prepare our transactions. Oracle can try and fail, and rollback, and we rollback all the transactions everywhere. And all we have to know was we got to the precommit hook and then we rolled back.
>> >>
>> >> In my patch the global transaction manager manages each status of
>> >> foreign servers participating in global transactions with WAL logging.
>> >> The fate of transaction on foreign server will be determined according
>> >> to the state of local transaction and their status. WAL logging is
>> >> important because not only in term of speedup but also supporting
>> >> streaming replication.
>> >
>> >
>> > So you are optimizing for large numbers of prepared transactions or at a high rate?
>>
>> I don't do optimizations much for the patch as this is the first
>> implementation. Once the basic feature committed I will do that.
>>
>> >
>> > Also does the background worker get fired again on recovery as needed?
>>
>> No. I added new SQL function to fix global transactions. We need to
>> execute that function manually after recovery.
Oops, sorry I was wrong. The background workers will get fired
automatically by the launcher process after the recovery.
>
>
> And that is solely on a case where the db does a full restart. Is it possible the background worker could survive a backend restart?
Yes, all background worker resolving foreign transactions are launched
by the launcher. So the backend failure doesn't affect the background
worker and even if they fail then the launcher will launch them again
as long as there is not-completed global transaction.
>>
>>
>> >>
>> >>
>> >> >>
>> >> >> The design the patch chose is making backends do only PREPARE and wait
>> >> >> for the background worker to complete COMMIT PREPARED. In this design
>> >> >> the clients get a "committed" only either when successful in commit on
>> >> >> all participants or when they cancel the query explicitly. In other
>> >> >> words, the client will wait for completion of 2nd phase of two-phase
>> >> >> commit forever unless it cancels.
>> >> >
>> >> >
>> >> > In this approach we make a best effort to commit or rollback (as appropriate in the state of the global transaction) *all* remote transactions during global commit or global rollback. It is not guaranteed but it avoids breaking semantics as much as we can. Also the background worker here does not need to attach to shared memory since the log has everything required. COMMIT PREPARED ought to be a fast operation unless there are network problems but those can affect prepare as well.
>> >> >
>> >> > Also imagine a case where you are writing to three dbs. One is on Oracle, one on DB2, and one on PostgreSQL You successfully prepare your transaction. DB2 successfully prepares, and then the Oracle db errors for some reason (maybe a deferred constraint). Does the background worker have enough information to know to roll back your transaction on the remote side?
>> >>
>> >> I think that what the background worker needs to know to rollback
>> >> remote transactions are how to rollback and what to rollback. How to
>> >> rollback is defined in each FDWs.
>> >
>> >
>> > Agreed. And naturally same with commits.
>> >
>> > My assumption is that each foreign data wrapper would have to set its own precommit/commit hook callbacks. I think your patch extends the fdw structure to try to ensure these are done automatically?
>>
>> Yes. The patch adds new FDW APIs for the atomic commit such as
>> prepare, commit, rollback, resolve(2nd phase of 2PC). The FDW
>> developers who want make their FDW support the atomic commit need to
>> define these API and call the registration function when transaction
>> starts. If the FDW of the registered foreign server doesn't support
>> FDW's atomic commit API the transaction emit an error.
>>
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2018-08-22 10:28:45 | Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c) |
Previous Message | Andrey Borodin | 2018-08-22 08:35:47 | Re: Proposal: SLRU to Buffer Cache |