Re: ERROR could not access transaction/Could not open file pg_commit_ts

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR could not access transaction/Could not open file pg_commit_ts
Date: 2018-03-09 18:34:04
Message-ID: CAMa1XUiZZ7vHM8qupUELa0QnazJ5aUjLm55AF9a1+ZjkGupznA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Mar 9, 2018 at 10:43 AM, Jeremy Finzel <finzelj(at)gmail(dot)com> wrote:

> Hello -
>
> Here is our cluster setup:
>
> cluster_a 9.5.11 Ubuntu 16.04.4 LTS
> --> cluster_b (streamer) 9.5.11 Ubuntu 16.04.4 LTS
> --> cluster_c (streamer) 9.5.11 Ubuntu 16.04.4 LTS
>
> Very recently, we started seeing these errors when running a query on a
> specific table on the streamer:
>
> 2018-03-09 08:28:16.280 CST,"uname","foo",18692,"0.0.0.0:0
> ",5aa29292.4904,4,"SELECT",2018-03-09 07:56:34 CST,18/15992,0,*ERROR*
> ,58P01,"*could not access status of transaction 1035047007*","*Could not
> open file ""pg_commit_ts/9A45*"": No such file or directory."
>
> A little history on the cluster:
>
> - The most recent change we made was a point release upgrade
> from 9.5.5 to 9.5.11 on the master, and 9.5.9 to 9.5.11 for the 2 streamers
> - It is a very high WAL traffic reporting system.
> - We actually have synchronous_commit set to off. It's possible this
> could have bitten us and we are just now seeing issues, however there have
> been no crashes since the table in question was created.
> - We have run pg_repack on many tables on this cluster, but that also
> has not happened since over a month
> - We had a similar error of missing pg_commit_ts file over a year ago
> after an actual crash. We had serious issues getting the cluster to start,
> and had to resort to recreating the missing pg_commit_ts with null
> bytes (IIRC, we had a snapshot of the system which still showed the file),
> which worked but left us questioning what really caused the issue.
>
>
> The table that is causing the error has been in production and used fine
> since 2/15/2018 when it was created. It is fed by pglogical replication (v.
> 2.1.1 on subscriber) from a system running 9.6.1 and pglogical v. 1.2.1.
> The point release upgrade from earlier 9.5 did take place *after* this.
>
> However, we *only* just started seeing errors in the past 12 hours. The
> table was autovacuumed on master at 2018-03-08 18:18:15.532137-06, which
> was about 3 hours before the first user query errored, however, I saw that
> 2 hours after the autovac, there was another user query that worked
> successfully on the table. Not sure if related?
>
> Any insight/ideas would be much appreciated!
>
> Thanks,
> Jeremy
>

UPDATE: what is actually failing is a call to
pg_xact_commit_timestamp(xmin) on a given table under the view. We still
think we must have some corruption though with pg_commit_ts.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2018-03-09 18:40:52 Re: ERROR could not access transaction/Could not open file pg_commit_ts
Previous Message Tom Lane 2018-03-09 16:54:06 Re: Feature request: min/max for macaddr type