Re: SynchRep; wait-forever and shutdown

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SynchRep; wait-forever and shutdown
Date: 2010-12-10 17:54:46
Message-ID: 4D026966.7020203@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> 3. Shutdown should abort all the blocking transactions?
> * Problem is that a client thinks that those transactions have been aborted
> even though those WAL records have been written on the master. But
> this is very common problem for DBMS, so we don't need to worry about
> this in the context of replication.

Hmmm. The WAL records are written as commited ... this is why people
get into 2PC if they want full synchrnous. Short of using 2PC, there is
simply no way we can guarentee that the master and the standby won't get
out of sync. And even 2PC isn't perfect.

I think the best we can do is have the master abort the sessions and
shutdown for a -fast. Yes, the clients are confused about what's been
committed, but frequently that's the case with a -fast anyway.

However, we need to give the user more information. I'd say that we
need to have a specific error message associated with a synchronization
failure around shutdown time. This error should be both returned to the
clients, and logged. That way the DBA can decide what to do about the
error, if anything.

So, I'd say this is the way to go:
Shutdown Smart:
Wait for all pending standby transaction to clear.
After 60 seconds, emit an error message on the shutdown console:
NOTICE: pending replication transactions still waiting
... that way the DBA knows to move on to -fast

Shutdown Fast:
Wait for 1 second for all pending standby transactions to clear.
If they don't clear, emit an error to both the shutdown console
and the client consoles:
WARNING: some transactions not replicated
Send a commit message on the client consoles
Shutdown.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2010-12-10 17:55:04 Re: On-the-fly index tuple deletion vs. hot_standby
Previous Message Hitoshi Harada 2010-12-10 17:54:05 Re: Why percent_rank is so slower than rank?