RE: crash pg_restore

From: Alain Benard <alain(dot)benard(at)inrae(dot)fr>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: "pgsql-fr-generale(at)postgresql(dot)org" <pgsql-fr-generale(at)postgresql(dot)org>
Subject: RE: crash pg_restore
Date: 2021-06-23 18:19:21
Message-ID: 06015857e372449abccea0939f1a4ec9@IDFDCPRIPEXMU06.inra.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-fr-generale


De : Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Envoyé : mercredi 23 juin 2021 16:20
À : Alain Benard <alain(dot)benard(at)inrae(dot)fr>
Cc : pgsql-fr-generale(at)postgresql(dot)org
Objet : Re: crash pg_restore

Le mer. 23 juin 2021 à 16:06, Alain Benard <alain(dot)benard(at)inrae(dot)fr<mailto:alain(dot)benard(at)inrae(dot)fr>> a écrit :
Ok voici ce que j’ai fait :

• Suppression de toutes les autres bases du serveur (il n’est pas accessible ni connu d’autres user pour l’instant).

• Arrêt Postgresql 12 / Suppression fichier log / Redémarrage PG 12

• Lancement de la restauration : pg_restore -C --dbname=template1 --clean --if-exists --format=c --verbose /data/save/2021-06-23db_bouquetins.gz 1>bouquetins 2>bouquetins.err

Voici le log complet Postgresql :
2021-06-23 14:01:31.775 CEST [4268] LOG: database system was shut down at 2021-06-23 14:01:10 CEST
2021-06-23 14:01:31.779 CEST [4265] LOG: database system is ready to accept connections
2021-06-23 14:08:13.962 CEST [4293] ERROR: canceling autovacuum task
2021-06-23 14:08:13.962 CEST [4293] CONTEXT: SQL function "raster_eq" statement 1
automatic analyze of table "db_bouquetins.env_data.t_pne_oso_2017"
2021-06-23 14:08:14.980 CEST [4296] ERROR: canceling autovacuum task
2021-06-23 14:08:14.980 CEST [4296] CONTEXT: SQL function "raster_eq" statement 1
automatic analyze of table "db_bouquetins.env_data.t_pnp_oso_2017"
free(): invalid pointer
free(): invalid pointer
2021-06-23 14:08:38.288 CEST [4265] LOG: background worker "parallel worker" (PID 4301) was terminated by signal 6: Aborted
2021-06-23 14:08:38.288 CEST [4265] LOG: terminating any other active server processes
2021-06-23 14:08:38.288 CEST [4285] WARNING: terminating connection because of crash of another server process
2021-06-23 14:08:38.288 CEST [4285] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-06-23 14:08:38.288 CEST [4285] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-06-23 14:08:38.296 CEST [4272] WARNING: terminating connection because of crash of another server process
2021-06-23 14:08:38.296 CEST [4272] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-06-23 14:08:38.296 CEST [4272] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-06-23 14:08:38.297 CEST [4299] WARNING: terminating connection because of crash of another server process
2021-06-23 14:08:38.297 CEST [4299] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-06-23 14:08:38.297 CEST [4299] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-06-23 14:08:38.336 CEST [4296] WARNING: terminating connection because of crash of another server process
2021-06-23 14:08:38.336 CEST [4296] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-06-23 14:08:38.336 CEST [4296] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-06-23 14:08:38.336 CEST [4296] CONTEXT: SQL function "raster_eq" statement 1
2021-06-23 14:08:38.725 CEST [4265] LOG: all server processes terminated; reinitializing
2021-06-23 14:08:39.033 CEST [4309] LOG: database system was interrupted; last known up at 2021-06-23 14:02:32 CEST
2021-06-23 14:08:39.381 CEST [4309] LOG: database system was not properly shut down; automatic recovery in progress
2021-06-23 14:08:39.390 CEST [4309] LOG: redo starts at 1A/E1553A28
2021-06-23 14:09:35.626 CEST [4309] LOG: invalid record length at 1B/B14992A0: wanted 24, got 0
2021-06-23 14:09:35.626 CEST [4309] LOG: redo done at 1B/B1499268
2021-06-23 14:09:39.061 CEST [4265] LOG: database system is ready to accept connections
Voici le petit bout de bouquetins.err où démarre l’erreur :
pg_restore: création de CHECK CONSTRAINT « env_data.t_pnra_oso_2017 enforce_max_extent_rast »
pg_restore: création de CONSTRAINT « env_data.t_cesbio t_cesbio_pkey »
pg_restore: WARNING: terminating connection because of crash of another server process
DÉTAIL : The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
ASTUCE : In a moment you should be able to reconnect to the database and repeat your command.
pg_restore: pendant le traitement de la TOC (« PROCESSING TOC ») :
pg_restore: de l'entrée TOC 5292 ; 2606 832468 CONSTRAINT t_cesbio t_cesbio_pkey ychaval
pg_restore: erreur : could not execute query: la connexion au serveur a été coupée de façon inattendue
Le serveur s'est peut-être arrêté anormalement avant ou durant le
traitement de la requête.
J’espère que tout y est. Est l’autovacuum ?? Du coup j’ai essayé avec autovacuum = off dans Postgresql.conf et voici le résultat :
2021-06-23 14:33:14.225 CEST [4415] LOG: background worker "parallel worker" (PID 4437) was terminated by signal 6: Aborted
2021-06-23 14:33:14.225 CEST [4415] LOG: terminating any other active server processes
2021-06-23 14:33:14.226 CEST [4435] WARNING: terminating connection because of crash of another server process
2021-06-23 14:33:14.226 CEST [4435] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-06-23 14:33:14.226 CEST [4435] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-06-23 14:33:14.732 CEST [4415] LOG: all server processes terminated; reinitializing
2021-06-23 14:33:14.989 CEST [4443] LOG: database system was interrupted; last known up at 2021-06-23 14:28:14 CEST
2021-06-23 14:33:15.159 CEST [4443] LOG: database system was not properly shut down; automatic recovery in progress
2021-06-23 14:33:15.175 CEST [4443] LOG: redo starts at 1B/B14A0588
2021-06-23 14:34:22.565 CEST [4443] LOG: invalid record length at 1C/8130CF00: wanted 24, got 0
2021-06-23 14:34:22.565 CEST [4443] LOG: redo done at 1C/8130CEC8
2021-06-23 14:34:26.218 CEST [4415] LOG: database system is ready to accept connections
Les log postgres ne donnent donc visiblement pas grand-chose et probablement que le message d’erreur ne reflète pas vraiment la réalité.

Si, ils donnent quelques informations, notamment ces trois lignes :

free(): invalid pointer
free(): invalid pointer
2021-06-23 14:08:38.288 CEST [4265] LOG: background worker "parallel worker" (PID 4301) was terminated by signal 6: Aborted

Les deux premières ne viennent pas de PostgreSQL, donc probablement d'une extension (PostGIS ou une autre que vous avez installé). Et vu le message d'erreur en question, ça peut occasionner un crash du serveur.
La troisième indique que le processus qui s'est terminé est un processus de parallélisation (process worker). Un premier test possible serait de désactiver la parallélisation pour savoir si cela améliore les choses ou pas.

--
Guillaume.

Merci Guillaume,
J’ai mis max_parallel_workers = 1 (je ne sais si c’est vraiment ça qui désactive le parallélisme) et ça ne change rien. Je pense que ton incrimination de Postgis est certainement la bonne piste car on trouve quelques âmes malheureuses qui ont le même souci sans que j’ai vu de solution pour l’instant (par exemple : https://stackoverflow.com/questions/63433659/raster2pgsql-free-invalid-pointer-aborted-core-dumped) J’ai essayé de restaurer le dump avec un script perl fournit avec Postgis (que j’ai déjà souvent utilisé par le passé pour des migrations) : /usr/pgsql-12/share/contrib/postgis-2.5/postgis_restore.pl – le résultat est le même. Pg_restore n’est donc pas le responsable comme tu le soulignais.
Je vais poursuivre mes recherches
Alain.

In response to

Responses

Browse pgsql-fr-generale by date

  From Date Subject
Next Message Bruno Friedmann 2021-06-23 19:18:16 Re: crash pg_restore
Previous Message Cédric Villemain 2021-06-23 15:35:47 Re: crash pg_restore