Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunk number 0 for toast value .. in pg_toast_2619

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunk number 0 for toast value .. in pg_toast_2619
Date: 2019-08-30 14:26:55
Message-ID: 20190830142655.GA14011@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Moving this old thread to -hackers
https://www.postgresql.org/message-id/flat/20180519142603.GA30060%40telsasoft.com

I wanted to mention that this seems to still be an issue, now running pg11.5.

log_time | 2019-08-30 23:20:00.118+10
user_name | postgres
database | ts
session_id | 5d69227e.235
session_line | 1
command_tag | CLUSTER
session_start_time | 2019-08-30 23:19:58+10
error_severity | ERROR
sql_state_code | XX000
message | unexpected chunk number 1 (expected 0) for toast value 2369261203 in pg_toast_2619
query | CLUSTER pg_statistic USING pg_statistic_relid_att_inh_index
application_name | psql

Note that my original report was for "missing" chunk during "VACUUM FULL", and
the current error is "unexpected chunk" during CLUSTER. I imagine that's
related issue. I haven't seen this in awhile (but stopped trying to reproduce
it long ago). A recently-deployed update to this maintenance script is
probably why it's now doing CLUSTER.

On Fri, May 25, 2018 at 08:49:50AM -0500, Justin Pryzby wrote:
> On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote:
> > Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> > > [pryzbyj(at)database ~]$ while :; do for db in `psql postgres -Atc "SELECT datname FROM pg_database WHERE datallowconn"`; do for t in pg_statistic pg_attrdef pg_constraint; do echo "$db.$t..."; PGOPTIONS=-cstatement_timeout='9s' psql $db -qc "VACUUM FULL $t"; done; done; done
> >
> > > ...
> > > postgres.pg_statistic...
> > > postgres.pg_attrdef...
> > > postgres.pg_constraint...
> > > template1.pg_statistic...
> > > template1.pg_attrdef...
> > > template1.pg_constraint...
> > > ts.pg_statistic...
> > > ERROR: canceling statement due to statement timeout
> > > ts.pg_attrdef...
> > > ts.pg_constraint...
> > > postgres.pg_statistic...
> > > ERROR: missing chunk number 0 for toast value 3372855171 in pg_toast_2619
> >
> > Hm, so was the timeout error happening every time through on that table,
> > or just occasionally, or did you provoke it somehow? I'm wondering how
> > your 9s timeout relates to the expected completion time.
>
> Actually statement_timeout isn't essential for this (maybe it helps to triggers
> it more often - not sure).
>
> Could you try:
> time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE pg_toast.pg_toast_2619"; psql postgres -c "VACUUM FULL VERBOSE pg_statistic"; done'; date
>
> Three servers experienced error within 30min, but one server didn't fail until
> 12h later, and a handful others still haven't failed..
>
> Does this help at all ?
> 2018-05-24 21:57:49.98-03 | 5b075f8d.1ad1 | LOG | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
> 2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO | pryzbyj | postgres | vacuuming "pg_toast.pg_toast_2619"
> 2018-05-24 21:57:50.09-03 | 5b075f8d.1ad1 | INFO | pryzbyj | postgres | "pg_toast_2619": found 0 removable, 408 nonremovable row versions in 99 pages
> 2018-05-24 21:57:50.12-03 | 5b075f8e.1ada | LOG | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_statistic
> 2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO | pryzbyj | postgres | vacuuming "pg_catalog.pg_statistic"
> 2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR | pryzbyj | postgres | missing chunk number 0 for toast value 3382957233 in pg_toast_2619
>
> Some thing; this server has autovacuum logging, although it's not clear to me
> if that's an essential component of the problem, either:
> 2018-05-24 21:16:39.856-06 | LOG | 5b078017.7b99 | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
> 2018-05-24 21:16:39.876-06 | LOG | 5b078010.7968 | | | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1 +
> | | | | | pages: 0 removed, 117 r
> 2018-05-24 21:16:39.909-06 | INFO | 5b078017.7b99 | pryzbyj | postgres | vacuuming "pg_toast.pg_toast_2619"
> 2018-05-24 21:16:39.962-06 | INFO | 5b078017.7b99 | pryzbyj | postgres | "pg_toast_2619": found 0 removable, 492 nonremovable row versions in 117 pages
> 2018-05-24 21:16:40.025-06 | LOG | 5b078018.7b9b | pryzbyj | postgres | statement: VACUUM FULL VERBOSE pg_statistic
> 2018-05-24 21:16:40.064-06 | INFO | 5b078018.7b9b | pryzbyj | postgres | vacuuming "pg_catalog.pg_statistic"
> 2018-05-24 21:16:40.145-06 | ERROR | 5b078018.7b9b | pryzbyj | postgres | missing chunk number 0 for toast value 765874692 in pg_toast_2619
>
> Or this one?
>
> postgres=# SELECT log_time, database, user_name, session_id, left(message,999) FROM postgres_log WHERE (log_time>='2018-05-24 19:56' AND log_time<'2018-05-24 19:58') AND (database='postgres' OR database IS NULL OR user_name IS NULL OR user_name='pryzbyj') AND message NOT LIKE 'statement:%' ORDER BY 1;
>
> log_time | 2018-05-24 19:56:35.396-04
> database |
> user_name |
> session_id | 5b075131.3ec0
> left | skipping vacuum of "pg_toast_2619" --- lock not available
>
> ...
>
> log_time | 2019-05-24 19:57:35.78-04
> database |
> user_name |
> session_id | 5b07516d.445e
> left | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1
> : pages: 0 removed, 85 remain, 0 skipped due to pins, 0 skipped frozen
> : tuples: 1 removed, 348 remain, 0 are dead but not yet removable, oldest xmin: 63803106
> : buffer usage: 179 hits, 4 misses, 87 dirtied
> : avg read rate: 1.450 MB/s, avg write rate: 31.531 MB/s
> : system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s
>
> log_time | 2018-05-24 19:57:35.879-04
> database | postgres
> user_name | pryzbyj
> session_id | 5b07516f.447f
> left | missing chunk number 0 for toast value 624341680 in pg_toast_2619
>
> log_time | 2018-05-24 19:57:44.332-04
> database |
> user_name |
> session_id | 5af9fda3.70d5
> left | checkpoint starting: time

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2019-08-30 14:52:04 Re: SQL equivalint of #incude directive ?
Previous Message Stephen Frost 2019-08-30 14:21:37 Re: SSPI auth and mixed case usernames