Re: invalid non-zero objectSubId for object class

From: Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: invalid non-zero objectSubId for object class
Date: 2020-07-10 00:14:49
Message-ID: CACxu=vKvqpEti11owAKEX8n4NVeoBnP+c=pkgTd7i8+Uoongww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jul 9, 2020 at 4:18 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com> writes:
> > On a 12.3 AWS RDS instance, I get the following error when trying to drop
> > either of two tables:
>
> > dev=> drop table current_flight;
> > ERROR: invalid non-zero objectSubId for object class 297108
> > dev=> drop table flight;
> > ERROR: invalid non-zero objectSubId for object class 297108
>
> This looks like corrupt data in pg_depend, specifically an entry or
> entries with classid or refclassid = 297108, which should not happen
> (the classid should always be the OID of one of a short list of system
> catalogs). You could try poking around in pg_depend to see if you
> can identify any obviously-bogus rows.
>

Hi Tom, thanks for getting back so quick:

I don't seem to have either:

dev=> select * from pg_depend where classid = 297108 or refclassid = 297108;
classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
(0 rows)

I'm not sure what a bogus row would look like.

> No idea how it got that way. Have you had any database crashes or the
> like?
>

No crashes, but a restart and one upgrade. On Sunday and Monday, at
exactly UTC midnight we run a cron job to create a new partition for an
unrelated table and attach it to a pglogical replication set. I updated
the procedure on saturday to create two new partitions for two unrelated
tables, and that somehow caused an issue on 12.2 / pglogical 2.3.0 that
caused an error, but not a crash. What's puzzling is that the two
partition creation still worked, and replicated to all downstream
consumers, but from that point on replication ceased and consumers logged
the error in the link below:

https://github.com/2ndQuadrant/pglogical/issues/267

This spooled up changes on the RDS primary until it filled up the storage.
On sunday we resized the instance and restarted, and reinitialized the
pglogical setup which restarted replication. On monday the error happened
again at midnight, and we restarted replication and upgraded to 12.3/2.3.1
on tuesday as recommended in the issue. It has thus run till now without
error and has been replicating nicely so have assumed that issue is fixed.

Neither of these two tables are involved in the midnight job, they're no
longer used and I was hoping to clean them up. I guess my concern should
be, is there additional possible corruption I can check for? And if that's
ok is there some manual intervention I can do to drop the tables?

Thanks,

-Michel

>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2020-07-10 00:26:54 Re: invalid non-zero objectSubId for object class
Previous Message James Sewell 2020-07-10 00:05:52 Safe switchover