Re: Logical replication lag in seconds

From: Klaus Darilion <klaus(dot)mailinglists(at)pernau(dot)at>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication lag in seconds
Date: 2020-03-04 17:33:50
Message-ID: 264f1acc-f79b-961f-72b4-dda3ea6c9834@pernau.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

For the records - with a simple script I hacked a solution which is
purely based on the server.

1. Create a table to track the timestamp of an lsn:

CREATE TABLE lsn2date(
lsn pg_lsn PRIMARY KEY,
seen timestamp NOT NULL DEFAULT NOW()
);
CREATE ROLE replication_lag_user WITH LOGIN PASSWORD 'xxx';
GRANT ALL ON TABLE lsn2date TO replication_lag_user;

2. Create a script which populates the table:

# cat /etc/systemd/system/calculate_logical_replication_lag.service
[Unit]
Description=Start and auto restart service

[Install]
WantedBy=multi-user.target

[Service]
ExecStart=/usr/bin/php /path/to/calculate_logical_replication_lag.php
Restart=always
RestartSec=10

# cat calculate_logical_replication_lag.php
<?php

$dbuser="replication_lag_user";
$dbpass="xxx";
if (!$dbconn = pg_pconnect('host=127.0.0.1 dbname=mydb user='.$dbuser.'
password='.$dbpass)) {
print "Sorry, database connection failed";
exit;
}

$accuracy = 10; // in seconds

while (1) {
$dbq = pg_query("INSERT INTO lsn2date (lsn) VALUES
(pg_current_wal_lsn())");
if ($dbq === FALSE) {
mylog(LOG_ERROR, "SQL query error: ".pg_last_error()."\n");
exit(1);
}

$dbq = pg_query("DELETE FROM lsn2date WHERE lsn < (".
"SELECT lsn FROM lsn2date WHERE lsn < (".
"SELECT confirmed_flush_lsn FROM
pg_replication_slots ORDER BY confirmed_flush_lsn ASC LIMIT 1".
") ORDER BY lsn DESC LIMIT 1".
")"
);
if ($dbq === FALSE) {
mylog(LOG_ERROR, "SQL query error: ".pg_last_error()."\n");
exit(1);
}
sleep($accuracy);
}

3. Get the lag, using a function which compares the lsn of the
replication_slots with the lsn/timestamp in the lsn2date table:

CREATE OR REPLACE FUNCTION get_replication_lag() RETURNS TABLE
(subscriber name, lag bigint) AS
$BODY$
DECLARE
subscriber name;
BEGIN
FOR subscriber IN
SELECT slot_name FROM pg_replication_slots
LOOP
RETURN QUERY SELECT slot_name, EXTRACT(EPOCH FROM
NOW()-seen)::bigint lag from lsn2date,pg_replication_slots WHERE
slot_name=subscriber AND lsn < confirmed_flush_lsn ORDER BY lsn DESC
LIMIT 1;
END LOOP;
RETURN;
END
$BODY$
LANGUAGE plpgsql;

# SELECT * FROM get_replication_lag() ;
subscriber | lag
------------+-----
reg_sjc1 | 0
reg_ffm1 | 0
reg_tst2 | 0
reg_mia1 | 0
reg_jbg1 | 0
reg_ams1 | 0
reg_syy1 | 0
reg_wie1 | 0
reg_hkg1 | 0
reg_gnf1 | 0
reg_tor1 | 0
reg_sea1 | 0
reg_chi1 | 0
reg_dfw1 | 0
reg_sgp1 | 0
reg_lhr1 | 0

regards
Klaus

In response to

Browse pgsql-general by date

  From Date Subject
Next Message postggen2020 s 2020-03-04 18:38:55 Suggestion to reduce COPY command output to csv file
Previous Message Adrian Klaver 2020-03-04 16:19:41 Re: pg_dump and public schema