Re: Error stopping postgresql service on a standby server.

From: Dipti Bharvirkar <dbharvirkar(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Error stopping postgresql service on a standby server.
Date: 2012-09-05 12:14:19
Message-ID: CAL097-FzG9Di=43WAvt-U8+sGMqDK7VbmyNK6xnth2mTqh-MjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Has anyone encountered this issue? Why would the WAL receiver process not
stop when postmaster is shutdown?
Any suggestions on how to avoid running into this error or ways to recover
from it?

Thank you in advance for any inputs on this,

Dipti

On Fri, Aug 31, 2012 at 1:17 PM, Dipti Bharvirkar wrote:

> Hi,
>
> In our project, we use Postgres 9.1.3 version and asynchronous streaming
> replication.
> In recent times, on couple of our setups, we saw issues stopping Postgres
> service on the standby server after streaming replication was setup.
>
> The command "service postgresql stop" returned with a failure message. We
> use "pg_ctl stop -D '$PGDATA' -s -m fast" in the Postgres service script to
> stop the server.
> To see if there were some active client connections that were causing a
> failure in stopping Postgres service, I ran the query "SELECT * FROM
> pg_stat_activity;".
> It failed with the following error: psql: FATAL: the database system is
> shutting down
>
> "ps -ef | grep postgres" returned the following:
> postgres 14033 1 0 Aug28 ? 00:00:01
> /usr/pgsql-9.1/bin/postmaster -p 5432 -D /var/lib/pgsql/data
> postgres 14044 14033 0 Aug28 ? 00:00:00 postgres: logger process
> postgres 14046 14033 0 Aug28 ? 00:00:00 postgres: writer process
> postgres 14047 14033 0 Aug28 ? 00:00:00 postgres: stats collector
> process
> postgres 14912 14033 0 Aug28 ? 00:00:00 postgres: wal receiver
> process
> root 31519 3003 0 06:18 pts/2 00:00:00 grep postgres
>
> "netstat -anp | grep 5432" returns the following:
> tcp 0 0 0.0.0.0:5432 0.0.0.0:*
> LISTEN 14033/postmaster
> tcp 0 0 127.0.0.1:5432 127.0.0.1:60597
> TIME_WAIT -
> tcp 0 0 127.0.0.1:5432 127.0.0.1:60589
> TIME_WAIT -
> tcp 67288 0 1.1.1.1:61500 <http://47.11.49.176:61500> 2
> .2.2.2:5432 <http://47.11.49.190:5432> ESTABLISHED
> 14912/postgres: wal
>
> I had a few queries based on some of the observations -
>
> 1. On one of the setups where similar issue was observed, we stopped
> Postgres service on the master server. As a result of this, the sender
> process on the master server and consequently the receiver process on
> standby stopped. After this, Postgres service could successfully be stopped
> on the standby server. This fact coupled with the output of the two
> commands mentioned above makes me believe that it is the "wal receiver"
> process that is not getting terminated because of which the Postgres
> service on standby server does not stop. Is this assumption right?
> 2. If yes, what could be the possible cause for the receiver process
> to not terminate? Shouldn't it stop gracefully when a shutdown command is
> received? When the issue occurred, we had minimal activity on the master
> server. There were no long running transactions being committed to the
> master and streamed to the standby when the issue occurred. Even if there
> were, could it cause the receiver process to not terminate?
> 3. How can we avoid running into this issue? Could we be missing some
> step that is essential for a graceful shutdown of the service on a standby?
> 4. On one setup where the issue was seen, since "-m fast" option with
> "pg_ctl stop" did not help in stopping the service, I used the "-m
> immediate" option. The service stopped (I understand that this option
> aborts the processes without a clean shutdown and so is not a safe option).
> The service would not start back up. We saw the "invalid record length"
> error during the startup (I guess this was expected since it wasn't a clean
> shutdown). A pg_resetxlog helped recover from this issue. However, that
> seems risky too since there is a chance of data inconsistency. What is the
> best way to recover from this error if it occurs again?
>
> Thanks,
> Dipti
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Angelico 2012-09-05 13:44:08 Re: "Too far out of the mainstream"
Previous Message Herouth Maoz 2012-09-05 12:08:09 Re: Maintaining a materialized view only on a replica