From: | AYahorau(at)ibagroup(dot)eu |
---|---|
To: | pgsql-admin(at)postgresql(dot)org |
Cc: | MikalaiKeida(at)ibagroup(dot)eu |
Subject: | Re: Logical replication monitoring |
Date: | 2018-08-24 08:49:39 |
Message-ID: | OF6FD7F6E7.73FE5F83-ON432582F3.002D8C89-432582F3.00307DF7@iba.by |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hello,
Thank you for the suggestion.
I increased wal_receiver_timeout , wal_sender_timeout parameters and now
this error does not occur.
I installed tail_n_mail utility, made a simple config started in debug
mode.
I am constantly facing the same error:
WARNING! Skipping non-existent file
"/var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034"
Too many loops (20161): bailing:
The configuration file tail_n_mail.conf is quiet standart:
EMAIL: someone(at)example(dot)com
PGLOG: log
MAILSUBJECT: Acme HOST Postgres errors UNIQUE : NUMBER
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
FILE1: /var/lib/pgsql/pg_log/postgresql.log-%Y-%m-%d_%H%M%S
LASTFILE1: /var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034
Could you please say is there anything wrong in my configuration or script
usage?
Thank you,
Andrei Yahorau
From: Andrei Yahorau/IBA
To: pgsql-admin(at)postgresql(dot)org,
Cc: Mikalai Keida/IBA(at)IBA
Date: 13/08/2018 13:16
Subject: Re: Logical replication monitoring
Hello!
Thank you for your suggestion.
I afraid this approach is not suitable for me. As a rule my postgresql
log on subscriber side contains a bunch of the following entries:
ERROR: terminating logical replication worker due to timeout
00000 LOG: worker process: logical replication worker for subscription
24578 (PID 6217) exited with exit code 1
How should I handle this situation?
As I understand this is quite normal situation. But why is severity for it
an ERROR ?
I have another assumption. Could you correct me if I am wrong.
I found out in the source code that logical replication worker termination
depends on wal_receiver_timeout paramer.
So I propose setting wal_receiver_timeout to 0.
In this case I think that monitoring of the following views
pg_stat_subscription, pg_publication and pg_stat_replication is enough.
In case if there is some problem with connection or with replication
pg_stat_replication will show nothing because wal sender will not be
working otherwise it will give some information.
Am I right? Are there any vulnerabilities in this approach ?
Best regards,
Andrei Yahorau
From: Andrei Yahorau/IBA
To: pgsql-admin(at)postgresql(dot)org,
Cc: Mikalai Keida/IBA(at)IBA
Date: 10/08/2018 13:05
Subject: Logical replication monitoring
Hello PostgreSQL Community!
I configured logical replication for PostgreSQL 10.4 on 2 machines, set
wal_level to logical, created a publication on master node and created a
subscription on standby node according to the PostgreSQL documentation.
Could you please suggest an approach for replication state monitoring.
According to my experience the monitoring of pg_stat_subscription and
pg_publication, pg_replication_slots unfortunately is not enough for this
aim. Moreover standby database does not prohibit write operations by
default and it can lead to some inconsistency between these databases.
For example a chain of queries as
SELECT pg_is_is_recovery(),
SELECT * FROM pg_stat_replication and
SELECT * FROM pg_stat_wal_receiver
provide insight into replication state for hot_standby replication.
So is there a reliable way of replication state monitoring for logical
replication?
Best regards,
Andrei Yahorau
From | Date | Subject | |
---|---|---|---|
Next Message | Achilleas Mantzios | 2018-08-24 09:18:26 | Re: Logical replication monitoring |
Previous Message | Mark Williams | 2018-08-23 16:53:55 | RE: Setting up SSL for postgre |