Re: Cluster seems broken after pg_basebackup

From: Guillaume Drolet <droletguillaume(at)gmail(dot)com>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Cluster seems broken after pg_basebackup
Date: 2015-02-09 18:39:24
Message-ID: CAOkiyv7yX-P7Oxe6zmvn-nwWK2Da-DSFoAwQudGmXesGvGAcWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Guillaume: the cluster I try to start is the one used with pg_basebackup,
not the result of the backup.

2015-02-09 12:51 GMT-05:00 Guillaume Lelarge <guillaume(at)lelarge(dot)info>:

> 2015-02-09 18:40 GMT+01:00 Guillaume Drolet <droletguillaume(at)gmail(dot)com>:
>
>> I tried starting the cluster again. Once again everything looked fine at
>> the start (first three lines of this log, in English this time):
>>
>> 2015-02-09 11:40:55 EST LOG: database system was shut down at 2015-02-06
>> 09:50:21 EST
>> 2015-02-09 11:40:55 EST LOG: database system is ready to accept
>> connections
>> 2015-02-09 11:40:55 EST LOG: autovacuum launcher started
>>
>> Since it seemed to work, I opened the terminal and tried connecting to
>> the database:
>>
>> C:\Users\admlocal>psql -U postgres -d mortalite
>>
>> So far so good, I got a connection:
>>
>> psql (9.3.5)
>> Attention : l'encodage console (850) diffère de l'encodage Windows (1252).
>> Les caractères 8 bits peuvent ne pas fonctionner correctement.
>> Voir la section « Notes aux utilisateurs de Windows » de la
>> page
>> référence de psql pour les détails.
>> Saisissez « help » pour l'aide.
>>
>> mortalite=#
>>
>> I tried the help command and it worked:
>>
>> mortalite=# help
>> Vous utilisez psql, l'interface en ligne de commande de PostgreSQL.
>> Saisissez:
>> \copyright pour les termes de distribution
>> \h pour l'aide-mémoire des commandes SQL
>> \? pour l'aide-mémoire des commandes psql
>> \g ou point-virgule en fin d'instruction pour exécuter la requête
>> \q pour quitter
>>
>>
>> But then when I tried to query the db, it crashed:
>>
>> mortalite=# \dt
>> la connexion au serveur a été coupée de façon inattendue
>> Le serveur s'est peut-être arrêté anormalement avant ou durant le
>> traitement de la requête.
>> La connexion au serveur a été perdue. Tentative de réinitialisation :
>> Échec.
>> !>
>>
>> And here's the rest of the log file after the crash:
>>
>> 2015-02-09 12:29:19 EST LOG: server process (PID 2240) was terminated by
>> exception 0xC0000005
>> 2015-02-09 12:29:19 EST DETAIL: Failed process was running: SELECT
>> n.nspname as "Schema",
>> c.relname as "Name",
>> CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm'
>> THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence'
>> WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
>> pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
>> FROM pg_catalog.pg_class c
>> LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
>> WHERE c.relkind IN ('r','')
>> AND n.nspname <> 'pg_catalog'
>> AND n.nspname <> 'information_schema'
>> AND n.nspname !~ '^pg_toast'
>> AND pg_catalog.pg_table_is_visible(c.oid)
>> ORDER BY 1,2;
>> 2015-02-09 12:29:19 EST HINT: See C include file "ntstatus.h" for a
>> description of the hexadecimal value.
>> 2015-02-09 12:29:19 EST LOG: terminating any other active server
>> processes
>> 2015-02-09 12:29:19 EST WARNING: terminating connection because of crash
>> of another server process
>> 2015-02-09 12:29:19 EST DETAIL: The postmaster has commanded this server
>> process to roll back the current transaction and exit, because another
>> server process exited abnormally and possibly corrupted shared memory.
>> 2015-02-09 12:29:19 EST HINT: In a moment you should be able to
>> reconnect to the database and repeat your command.
>> 2015-02-09 12:29:19 EST LOG: archiver process (PID 4576) exited with
>> exit code 1
>> 2015-02-09 12:29:19 EST LOG: all server processes terminated;
>> reinitializing
>> 2015-02-09 12:29:29 EST FATAL: pre-existing shared memory block is still
>> in use
>> 2015-02-09 12:29:29 EST HINT: Check if there are any old server
>> processes still running, and terminate them.
>>
>> According to this page
>> <https://msdn.microsoft.com/en-ca/library/cc704588.aspx>, exception
>> 0xC0000005 means STATUS_ACCESS_VIOLATION - The instruction at 0x%08lx
>> referenced memory at 0x%08lx. The memory could not be %s. This is not of
>> much help to me.
>>
>> I hope these additional bits of information can help someone figuring out
>> a solution to get my cluster up and running again.
>>
>> PS. I was thinking of reinstalling PGSQL over my current install but
>> keeping my PGDATA. I've done it in the past for fixing problems with
>> starting the service and it worked. What do you think?
>>
>>
> The SQL query you see in your log is the result of your \dt. It should
> work. I don't see why reinstalling PostgreSQL will fix anything here.
>
> You said you did a pg_basebackup. The cluster you try to start is the
> cluster used with pg_basebackup or a restore done with the pg_basebackup?
>
> Cheers,
>>
>> Guillaume
>>
>>
>>
>>
>>
>> 2015-02-09 11:37 GMT-05:00 Guillaume Drolet <droletguillaume(at)gmail(dot)com>:
>>
>>
>>>
>>> 2015-02-07 1:24 GMT-05:00 Guillaume Lelarge <guillaume(at)lelarge(dot)info>:
>>>
>>>> Le 6 févr. 2015 17:31, "Adrian Klaver" <adrian(dot)klaver(at)aklaver(dot)com> a
>>>> écrit :
>>>> >
>>>> > On 02/06/2015 05:03 AM, Guillaume Drolet wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> Yesterday I ran a pg_basebackup of my cluster. Since it has
>>>> completed,
>>>> >> my cluster doesn't work properly. I tried restarting the computer (or
>>>> >> service) a few times but I always get the same messages in my logs
>>>> (it's
>>>> >> in French. If someone is willing to help me I can try to translate
>>>> the
>>>> >> logs. Just ask):
>>>> >
>>>> >
>>>> > Enter Google Translate:)
>>>> >
>>>>
>>>> But first, Guillaume, do yourself and everyone else a favor: turn the
>>>> dam log into English. Set lc_messages to 'C' in postgresql.conf.
>>>>
>>> Thanks for this! I didn't know about this great feature.
>>>
>>>
>>>> > First some questions:
>>>> >
>>>> > 1) What Postgres version?
>>>> >
>>>> > 2) What OS(s)? I am assuming Windows from the log info below, but we
>>>> all know what assuming gets you.
>>>> >
>>>> > 3) Where were you backing up from and to?
>>>> >
>>>> > 4) Which cluster does not start, the master or the child you created
>>>> with pg_basebackup?
>>>> >
>>>> >
>>>> >>
>>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données a été
>>>> >> interrompu ; dernier lancement connu à 2015-02-06 07:05:05 EST
>>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données n'a pas
>>>> été
>>>> >> arrêté proprement ; restauration
>>>> >> automatique en cours
>>>> >> 2015-02-06 07:11:38 EST LOG: record with zero length at 24B/2C000160
>>>> >> 2015-02-06 07:11:38 EST LOG: la ré-exécution n'est pas nécessaire
>>>> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données est prêt
>>>> >> pour accepter les connexions
>>>> >> 2015-02-06 07:11:38 EST LOG: lancement du processus autovacuum
>>>> >> 2015-02-06 07:11:38 EST FATAL: le rôle « 208375PT$ » n'existe pas
>>>> >
>>>> >
>>>> > So where is role 208375PT$ supposed to come from?
>>>> >
>>>> >
>>>> >>
>>>> >> Then if I start pgAdmin I get a series of pop-ups I have to click OK
>>>> to
>>>> >> to continue:
>>>> >>
>>>> >> An error has ocurred: Column not found in pgSet: "datlastsysoid"
>>>> >> An error has ocurred: Column not found in pgSet: datlastsysoid
>>>> >> An error has ocurred: Column not found in pgSet: oid
>>>> >> An error has ocurred: Column not found in pgSet: encoding
>>>> >> An error has ocurred: Column not found in pgSet: Connection to
>>>> database
>>>> >> broken
>>>> >
>>>> >
>>>> > Not sure about that this, someone more versed in pgAdmin will have to
>>>> answer.
>>>> >
>>>>
>>>> Usually you see these messages when you're using a pgadmin major
>>>> release older than a PostgreSQL make release. For a 9.3 release, that would
>>>> mean a pgadmin older than 1.18.
>>>>
>>>
>>> I'm running pgadmin 1.18.1
>>>
>>>> >
>>>> >>
>>>> >> And after that, I went back to the log file and there's new
>>>> information
>>>> >> added:
>>>> >>
>>>> >> 2015-02-06 07:51:05 EST LOG: processus serveur (PID 184) a été
>>>> arrêté
>>>> >> par l'exception 0x80000004
>>>> >> 2015-02-06 07:51:05 EST DÉTAIL: Le processus qui a échoué exécutait
>>>> :
>>>> >> SELECT version();
>>>> >> 2015-02-06 07:51:05 EST ASTUCE : Voir le fichier d'en-tête C «
>>>> >> ntstatus.h » pour une description de la valeur
>>>> >> hexadécimale.
>>>> >
>>>> >
>>>> > Well according to here:
>>>> >
>>>> > https://msdn.microsoft.com/en-us/library/cc704588.aspx
>>>> >
>>>> > 0x80000004
>>>> > STATUS_SINGLE_STEP
>>>> >
>>>> >
>>>> > {EXCEPTION} Single Step A single step or trace operation has just
>>>> been completed.
>>>> >
>>>> > A developer is going to have explain what that means.
>>>> >
>>>> >
>>>> >
>>>> >> 2015-02-06 07:51:05 EST LOG: arrêt des autres processus serveur
>>>> actifs
>>>> >> 2015-02-06 07:51:05 EST ATTENTION: arrêt de la connexion à cause de
>>>> >> l'arrêt brutal d'un autre processus serveur
>>>> >> 2015-02-06 07:51:05 EST DÉTAIL: Le postmaster a commandé à ce
>>>> processus
>>>> >> serveur d'annuler la transaction
>>>> >> courante et de quitter car un autre processus serveur a quitté
>>>> >> anormalement
>>>> >> et qu'il existe probablement de la mémoire partagée corrompue.
>>>> >> 2015-02-06 07:51:05 EST ASTUCE : Dans un moment, vous devriez être
>>>> >> capable de vous reconnecter à la base de
>>>> >> données et de relancer votre commande.
>>>> >> 2015-02-06 07:51:05 EST LOG: processus d'archivage (PID 692) quitte
>>>> >> avec le code de sortie 1
>>>> >> 2015-02-06 07:51:05 EST LOG: tous les processus serveur se sont
>>>> >> arrêtés, réinitialisation
>>>> >> 2015-02-06 07:51:15 EST FATAL: le bloc de mémoire partagé
>>>> pré-existant
>>>> >> est toujours en cours d'utilisation
>>>> >> 2015-02-06 07:51:15 EST ASTUCE : Vérifier s'il n'y a pas de vieux
>>>> >> processus serveur en cours d'exécution. Si c'est le
>>>> >> cas, fermez-les.
>>>> >>
>>>> >> I was about to try restarting postgresql using the base backup I made
>>>> >> yesterday but since this means I'll have to copy my database again
>>>> (700
>>>> >> GB takes a while...) I am looking for a better solution from more
>>>> >> experienced people.
>>>> >
>>>> >
>>>> >
>>>> > My suspicion is you copied at least partly over a running server.
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Guillaume.
> http://blog.guillaume.lelarge.info
> http://www.dalibo.com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeroen Ooms 2015-02-09 19:53:56 Building proper static library libpq.a
Previous Message Guillaume Lelarge 2015-02-09 17:51:33 Re: Cluster seems broken after pg_basebackup