Re: Cluster seems broken after pg_basebackup

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Guillaume Drolet <droletguillaume(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Cluster seems broken after pg_basebackup
Date: 2015-02-09 17:51:33
Message-ID: CAECtzeU=7AW19Exgv8-p5F+RGNTD4y_qGTPmk5bQS-2TjzqQMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2015-02-09 18:40 GMT+01:00 Guillaume Drolet <droletguillaume(at)gmail(dot)com>:

> I tried starting the cluster again. Once again everything looked fine at
> the start (first three lines of this log, in English this time):
>
> 2015-02-09 11:40:55 EST LOG: database system was shut down at 2015-02-06
> 09:50:21 EST
> 2015-02-09 11:40:55 EST LOG: database system is ready to accept
> connections
> 2015-02-09 11:40:55 EST LOG: autovacuum launcher started
>
> Since it seemed to work, I opened the terminal and tried connecting to the
> database:
>
> C:\Users\admlocal>psql -U postgres -d mortalite
>
> So far so good, I got a connection:
>
> psql (9.3.5)
> Attention : l'encodage console (850) diffère de l'encodage Windows (1252).
> Les caractères 8 bits peuvent ne pas fonctionner correctement.
> Voir la section « Notes aux utilisateurs de Windows » de la
> page
> référence de psql pour les détails.
> Saisissez « help » pour l'aide.
>
> mortalite=#
>
> I tried the help command and it worked:
>
> mortalite=# help
> Vous utilisez psql, l'interface en ligne de commande de PostgreSQL.
> Saisissez:
> \copyright pour les termes de distribution
> \h pour l'aide-mémoire des commandes SQL
> \? pour l'aide-mémoire des commandes psql
> \g ou point-virgule en fin d'instruction pour exécuter la requête
> \q pour quitter
>
>
> But then when I tried to query the db, it crashed:
>
> mortalite=# \dt
> la connexion au serveur a été coupée de façon inattendue
> Le serveur s'est peut-être arrêté anormalement avant ou durant le
> traitement de la requête.
> La connexion au serveur a été perdue. Tentative de réinitialisation :
> Échec.
> !>
>
> And here's the rest of the log file after the crash:
>
> 2015-02-09 12:29:19 EST LOG: server process (PID 2240) was terminated by
> exception 0xC0000005
> 2015-02-09 12:29:19 EST DETAIL: Failed process was running: SELECT
> n.nspname as "Schema",
> c.relname as "Name",
> CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm'
> THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence'
> WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
> pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
> FROM pg_catalog.pg_class c
> LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
> WHERE c.relkind IN ('r','')
> AND n.nspname <> 'pg_catalog'
> AND n.nspname <> 'information_schema'
> AND n.nspname !~ '^pg_toast'
> AND pg_catalog.pg_table_is_visible(c.oid)
> ORDER BY 1,2;
> 2015-02-09 12:29:19 EST HINT: See C include file "ntstatus.h" for a
> description of the hexadecimal value.
> 2015-02-09 12:29:19 EST LOG: terminating any other active server processes
> 2015-02-09 12:29:19 EST WARNING: terminating connection because of crash
> of another server process
> 2015-02-09 12:29:19 EST DETAIL: The postmaster has commanded this server
> process to roll back the current transaction and exit, because another
> server process exited abnormally and possibly corrupted shared memory.
> 2015-02-09 12:29:19 EST HINT: In a moment you should be able to reconnect
> to the database and repeat your command.
> 2015-02-09 12:29:19 EST LOG: archiver process (PID 4576) exited with exit
> code 1
> 2015-02-09 12:29:19 EST LOG: all server processes terminated;
> reinitializing
> 2015-02-09 12:29:29 EST FATAL: pre-existing shared memory block is still
> in use
> 2015-02-09 12:29:29 EST HINT: Check if there are any old server processes
> still running, and terminate them.
>
> According to this page
> <https://msdn.microsoft.com/en-ca/library/cc704588.aspx>, exception
> 0xC0000005 means STATUS_ACCESS_VIOLATION - The instruction at 0x%08lx
> referenced memory at 0x%08lx. The memory could not be %s. This is not of
> much help to me.
>
> I hope these additional bits of information can help someone figuring out
> a solution to get my cluster up and running again.
>
> PS. I was thinking of reinstalling PGSQL over my current install but
> keeping my PGDATA. I've done it in the past for fixing problems with
> starting the service and it worked. What do you think?
>
>
The SQL query you see in your log is the result of your \dt. It should
work. I don't see why reinstalling PostgreSQL will fix anything here.

You said you did a pg_basebackup. The cluster you try to start is the
cluster used with pg_basebackup or a restore done with the pg_basebackup?

Cheers,
>
> Guillaume
>
>
>
>
>
> 2015-02-09 11:37 GMT-05:00 Guillaume Drolet <droletguillaume(at)gmail(dot)com>:
>
>
>>
>> 2015-02-07 1:24 GMT-05:00 Guillaume Lelarge <guillaume(at)lelarge(dot)info>:
>>
>>> Le 6 févr. 2015 17:31, "Adrian Klaver" <adrian(dot)klaver(at)aklaver(dot)com> a
>>> écrit :
>>> >
>>> > On 02/06/2015 05:03 AM, Guillaume Drolet wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Yesterday I ran a pg_basebackup of my cluster. Since it has completed,
>>> >> my cluster doesn't work properly. I tried restarting the computer (or
>>> >> service) a few times but I always get the same messages in my logs
>>> (it's
>>> >> in French. If someone is willing to help me I can try to translate the
>>> >> logs. Just ask):
>>> >
>>> >
>>> > Enter Google Translate:)
>>> >
>>>
>>> But first, Guillaume, do yourself and everyone else a favor: turn the
>>> dam log into English. Set lc_messages to 'C' in postgresql.conf.
>>>
>> Thanks for this! I didn't know about this great feature.
>>
>>
>>> > First some questions:
>>> >
>>> > 1) What Postgres version?
>>> >
>>> > 2) What OS(s)? I am assuming Windows from the log info below, but we
>>> all know what assuming gets you.
>>> >
>>> > 3) Where were you backing up from and to?
>>> >
>>> > 4) Which cluster does not start, the master or the child you created
>>> with pg_basebackup?
>>> >
>>> >
>>> >>
>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données a été
>>> >> interrompu ; dernier lancement connu à 2015-02-06 07:05:05 EST
>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données n'a pas
>>> été
>>> >> arrêté proprement ; restauration
>>> >> automatique en cours
>>> >> 2015-02-06 07:11:38 EST LOG: record with zero length at 24B/2C000160
>>> >> 2015-02-06 07:11:38 EST LOG: la ré-exécution n'est pas nécessaire
>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données est prêt
>>> >> pour accepter les connexions
>>> >> 2015-02-06 07:11:38 EST LOG: lancement du processus autovacuum
>>> >> 2015-02-06 07:11:38 EST FATAL: le rôle « 208375PT$ » n'existe pas
>>> >
>>> >
>>> > So where is role 208375PT$ supposed to come from?
>>> >
>>> >
>>> >>
>>> >> Then if I start pgAdmin I get a series of pop-ups I have to click OK
>>> to
>>> >> to continue:
>>> >>
>>> >> An error has ocurred: Column not found in pgSet: "datlastsysoid"
>>> >> An error has ocurred: Column not found in pgSet: datlastsysoid
>>> >> An error has ocurred: Column not found in pgSet: oid
>>> >> An error has ocurred: Column not found in pgSet: encoding
>>> >> An error has ocurred: Column not found in pgSet: Connection to
>>> database
>>> >> broken
>>> >
>>> >
>>> > Not sure about that this, someone more versed in pgAdmin will have to
>>> answer.
>>> >
>>>
>>> Usually you see these messages when you're using a pgadmin major release
>>> older than a PostgreSQL make release. For a 9.3 release, that would mean a
>>> pgadmin older than 1.18.
>>>
>>
>> I'm running pgadmin 1.18.1
>>
>>> >
>>> >>
>>> >> And after that, I went back to the log file and there's new
>>> information
>>> >> added:
>>> >>
>>> >> 2015-02-06 07:51:05 EST LOG: processus serveur (PID 184) a été arrêté
>>> >> par l'exception 0x80000004
>>> >> 2015-02-06 07:51:05 EST DÉTAIL: Le processus qui a échoué exécutait :
>>> >> SELECT version();
>>> >> 2015-02-06 07:51:05 EST ASTUCE : Voir le fichier d'en-tête C «
>>> >> ntstatus.h » pour une description de la valeur
>>> >> hexadécimale.
>>> >
>>> >
>>> > Well according to here:
>>> >
>>> > https://msdn.microsoft.com/en-us/library/cc704588.aspx
>>> >
>>> > 0x80000004
>>> > STATUS_SINGLE_STEP
>>> >
>>> >
>>> > {EXCEPTION} Single Step A single step or trace operation has just been
>>> completed.
>>> >
>>> > A developer is going to have explain what that means.
>>> >
>>> >
>>> >
>>> >> 2015-02-06 07:51:05 EST LOG: arrêt des autres processus serveur
>>> actifs
>>> >> 2015-02-06 07:51:05 EST ATTENTION: arrêt de la connexion à cause de
>>> >> l'arrêt brutal d'un autre processus serveur
>>> >> 2015-02-06 07:51:05 EST DÉTAIL: Le postmaster a commandé à ce
>>> processus
>>> >> serveur d'annuler la transaction
>>> >> courante et de quitter car un autre processus serveur a quitté
>>> >> anormalement
>>> >> et qu'il existe probablement de la mémoire partagée corrompue.
>>> >> 2015-02-06 07:51:05 EST ASTUCE : Dans un moment, vous devriez être
>>> >> capable de vous reconnecter à la base de
>>> >> données et de relancer votre commande.
>>> >> 2015-02-06 07:51:05 EST LOG: processus d'archivage (PID 692) quitte
>>> >> avec le code de sortie 1
>>> >> 2015-02-06 07:51:05 EST LOG: tous les processus serveur se sont
>>> >> arrêtés, réinitialisation
>>> >> 2015-02-06 07:51:15 EST FATAL: le bloc de mémoire partagé
>>> pré-existant
>>> >> est toujours en cours d'utilisation
>>> >> 2015-02-06 07:51:15 EST ASTUCE : Vérifier s'il n'y a pas de vieux
>>> >> processus serveur en cours d'exécution. Si c'est le
>>> >> cas, fermez-les.
>>> >>
>>> >> I was about to try restarting postgresql using the base backup I made
>>> >> yesterday but since this means I'll have to copy my database again
>>> (700
>>> >> GB takes a while...) I am looking for a better solution from more
>>> >> experienced people.
>>> >
>>> >
>>> >
>>> > My suspicion is you copied at least partly over a running server.
>>> >
>>>
>>
>>
>

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Guillaume Drolet 2015-02-09 18:39:24 Re: Cluster seems broken after pg_basebackup
Previous Message Guillaume Drolet 2015-02-09 17:40:15 Re: Cluster seems broken after pg_basebackup