From: | Arjun Ranade <ranade(at)nodalexchange(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption? |
Date: | 2017-11-01 02:37:28 |
Message-ID: | CANrrCRxcFdXQvae7vrtWttHtXbVmB8yoYEt+2DhfADqm1fvb-g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
You might be onto something. I force quit the pglogical supervisor and
other session that had a backend_start of a few months ago. That didn't
fix the issue. Additionally, those processes didn't start back up which is
making me worry about the status of my logical replication. The
subscribers are reporting that they are still replicating but I won't
know.
However... when I was checking on the subscribers, I dropped their
replication links and recreated them (to see if that would get the
pglogical processes to start back up). I checked the pg_log on the master
(same server having the vacuum problem) to see if there was anything about
the subscriptions resuming and I found these messages in the log:
2017-10-31 22:16:42.658 EDT pid=28445 database= user= rhost= tid=3/2970517
sessionid=59f92ded.6f1d WARNING: oldest xmin is far in the past
2017-10-31 22:16:42.658 EDT pid=28445 database= user= rhost= tid=3/2970517
sessionid=59f92ded.6f1d HINT: Close open transactions soon to avoid
wraparound problems.
2017-10-31 22:16:45.815 EDT pid=28955 database= user= rhost= tid=3/2970521
sessionid=59f92e8d.711b WARNING: oldest xmin is far in the past
2017-10-31 22:16:45.815 EDT pid=28955 database= user= rhost= tid=3/2970521
sessionid=59f92e8d.711b HINT: Close open transactions soon to avoid
wraparound problems
However, I am now in a situation where I cannot find any old transactions
from pg_stat_activity or pg_prepared_xacts.
It may also be worth noting that the last time our db was restarted (~6
months ago, same date as the old processes that I killed) was due to the
datafile partition filling up.
FWIW, here are my vacuum related settings on this db:
name setting
autovacuum on
autovacuum_analyze_scale_factor 0.1
autovacuum_analyze_threshold 50
autovacuum_freeze_max_age 200000000
autovacuum_max_workers 3
autovacuum_multixact_freeze_max_age 400000000
autovacuum_naptime 60
autovacuum_vacuum_cost_delay 20
autovacuum_vacuum_cost_limit -1
autovacuum_vacuum_scale_factor 0.2
autovacuum_vacuum_threshold 50
autovacuum_work_mem -1
log_autovacuum_min_duration -1
vacuum_cost_delay 0
vacuum_cost_limit 200
vacuum_cost_page_dirty 20
vacuum_cost_page_hit 1
vacuum_cost_page_miss 10
vacuum_defer_cleanup_age 0
vacuum_freeze_min_age 50000000
vacuum_freeze_table_age 150000000
vacuum_multixact_freeze_min_age 5000000
vacuum_multixact_freeze_table_age 150000000
On Tue, Oct 31, 2017 at 6:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Arjun Ranade <ranade(at)nodalexchange(dot)com> writes:
> > There is nothing in pg_prepared_xacts, however, in pg_stat_activity there
> > are two pglogical processes that have a "backend_start" of "2017-06-17"
> > when the last time we restarted this server. Both are not waiting and
> have
> > null for "state." This might be expected behavior for pglogical.
>
> I don't know much about pglogical, but if it's holding back the xmin
> horizon (as it appears to be doing) that is a really bad pglogical bug.
>
> If you try vacuum verbose (doesn't have to be FULL) on some table that
> gets lots of update traffic, does it report a large number of dead-but-
> not-removable rows?
>
> > I did try TRUNCATE before but even as the postgres user, but for some
> > reason it didn't allow me to truncate because it was a system table.
>
> Mmm. You could get around that, but possibly not without restarting
> in single-user mode, which is probably more interruption in service
> than you want. It might be easier to force-quit the pglogical sessions.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ashok Kumar Tiwari | 2017-11-01 05:29:20 | ream and processor required |
Previous Message | Tom Lane | 2017-10-31 22:12:39 | Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption? |