Re: Cluster seems broken after pg_basebackup

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Guillaume Drolet <droletguillaume(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Cluster seems broken after pg_basebackup
Date: 2015-02-09 19:58:40
Message-ID: 54D91170.1000809@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 02/09/2015 08:34 AM, Guillaume Drolet wrote:

CCing list so the information stays in the thread.
>
>
> 2015-02-06 18:44 GMT-05:00 Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>>:
>
> On 02/06/2015 09:17 AM, Guillaume Drolet wrote:
>
> Dear Adrian,
>
> Thanks for helping me. Sorry for the lack of details, I had said to
> myself I had to not forget to give these details but I hit the send
> button too fast. You know how it is...
>
> I added more info in your reply below.
>
>
> First some questions:
>
> 1) What Postgres version?
>
>
> 9.3
>
>
> Windows 7
>
>
> 3) Where were you backing up from and to?
>
>
> Backing up from my only cluster (PGDATA) on disk E, to a backup
> directory on an other disk (F:) using this command:
>
> pg_basebackup -D "F:\\db_base_backup" -Fp -Xs -R -P
> --label="basebackup20150205" --username=postgres
>
> What's weird is that I did some successful tests last week on
> the same
> system (backing up, archiving, recovering) using the same procedure.
> Only difference was the cluster, which was much smaller for testing
> purposes, but located at the same place (i.e. E:\data) and
> PostgresSQL
> installed in C:\Programs\...
>
>
> 4) Which cluster does not start, the master or the child
> you created
> with pg_basebackup?
>
>
>
> The master. I haven't tried the child yet. But I saw that the
> message
> about role "208375PT$" is in logs from before the backup too.
>
>
> This is the local domain of my machine. I log onto my machine with a
> local admin account and using domain name 208375PT (I didn't set
> this
> part of my machine, the IT guys here at work did). The thing is:
> I don't
> understand why it's there in the log file??
>
>
> Not sure.
>
> What are you using for an authentication method for database login?
>

At this moment, for my tests I use md5 for user 'postgres' and trust for
user 'all'.

>
>
>
>
> And after that, I went back to the log file and there's new
> information
> added:
>
> 2015-02-06 07:51:05 EST LOG: processus serveur (PID
> 184) a été
> arrêté
> par l'exception 0x80000004
> 2015-02-06 07:51:05 EST DÉTAIL: Le processus qui a échoué
> exécutait :
> SELECT version();
> 2015-02-06 07:51:05 EST ASTUCE : Voir le fichier
> d'en-tête C «
> ntstatus.h » pour une description de la valeur
> hexadécimale.
>
>
> Well according to here:
>
> https://msdn.microsoft.com/en-____us/library/cc704588.aspx
> <https://msdn.microsoft.com/en-__us/library/cc704588.aspx>
> <https://msdn.microsoft.com/__en-us/library/cc704588.aspx
> <https://msdn.microsoft.com/en-us/library/cc704588.aspx>>
>
> 0x80000004
> STATUS_SINGLE_STEP
>
>
> {EXCEPTION} Single Step A single step or trace operation
> has just
> been completed.
>
> A developer is going to have explain what that means.
>
>
>
>
> My suspicion is you copied at least partly over a running
> server.
>
>
> How would that be possible? Using the pg_basebackup command I wrote
> above, it is clear that I wrote the backup on disk F and not E.
>
>
> I was just speculating, I would not put too much stock in it.
>
>
>
> While writing this post, I started my backup using:
>
> pg_ctl start -D "F:\db_basebackup"
>
> Similar stuff happened with pgAdmin and the log (message about
> symbolic
> link is related to my post from yesterday. I don't know if this
> could be
> involved in the current problem):
>
> 2015-02-06 12:13:58 EST LOG: le système de bases de données a été
> interrompu ; dernier lancement connu à 2015-02-05 14:30:34 EST
> 2015-02-06 12:13:58 EST LOG: création du répertoire manquant «
> pg_xlog/archive_status » pour les journaux de transactions
> 2015-02-06 12:13:58 EST LOG: la ré-exécution commence à
> 24B/28000090
> 2015-02-06 12:13:58 EST LOG: n'a pas pu supprimer le lien
> symbolique «
> pg_tblspc/940585 » : No such file or directory
> 2015-02-06 12:13:58 EST CONTEXTE : xlog redo drop tablespace:
> 940585
> 2015-02-06 12:13:58 EST LOG: état de restauration cohérent
> atteint à
> 24B/290000B8
> 2015-02-06 12:13:58 EST LOG: ré-exécution faite à 24B/290000B8
> 2015-02-06 12:13:58 EST LOG: la dernière transaction a eu lieu à
> 2015-02-05 09:06:04.892-05 (moment de la journalisation)
> 2015-02-06 12:13:59 EST LOG: le système de bases de données est
> prêt
> pour accepter les connexions
> 2015-02-06 12:13:59 EST LOG: lancement du processus autovacuum
> 2015-02-06 12:14:42 EST LOG: processus serveur (PID 1784) a été
> arrêté
> par l'exception 0x80000004
> 2015-02-06 12:14:42 EST DÉTAIL: Le processus qui a échoué
> exécutait :
> SELECT version();
> 2015-02-06 12:14:42 EST ASTUCE : Voir le fichier d'en-tête C «
> ntstatus.h » pour une description de la valeur
> hexadécimale.
> 2015-02-06 12:14:42 EST LOG: arrêt des autres processus serveur
> actifs
> 2015-02-06 12:14:42 EST ATTENTION: arrêt de la connexion à cause de
> l'arrêt brutal d'un autre processus serveur
> 2015-02-06 12:14:42 EST DÉTAIL: Le postmaster a commandé à ce
> processus
> serveur d'annuler la transaction
> courante et de quitter car un autre processus serveur a quitté
> anormalement
> et qu'il existe probablement de la mémoire partagée corrompue.
> 2015-02-06 12:14:42 EST ASTUCE : Dans un moment, vous devriez être
> capable de vous reconnecter à la base de
> données et de relancer votre commande.
> 2015-02-06 12:14:42 EST LOG: tous les processus serveur se sont
> arrêtés, réinitialisation
>
>
> Any ideas where to go from here?
>
>
> In both cases the database got to the point below, which would seem
> to indicate everything was alright.
>
> 2015-02-06 7:11:38 ET LOG: the re-execution is not required
> 2015-02-06 7:11:38 ET LOG: the database system is ready for
> accept connections
>
> Also from what I can see the server crashed at this point:
>
> 2015-02-06 12:13:59 LOG IS: launch autovacuum processes
> 2015-02-06 12:14:42 EST LOG: server process (PID 1784) was arrested
> by the exception 0x80000004
>
>
> Now 0x80000004 is supposed to mean:
>
> STATUS_SINGLE_STEP
>
>
> {EXCEPTION} Single Step A single step or trace operation has just
> been completed.
>
> Some digging indicates this is the result of debugger command. Have
> no idea how that would invoked in Postgres running production code.
> This leads to my default question when I see unexplained behavior on
> a Windows machine; do you have anti-virus machine running against
> the drives?
>
>

Yes I do and I'm not allowed to turn it off (I don't have such
privileges). But the anti-virus software is running on my other machine
(same setup) and I've never had such problems. Even on this machine
that's giving me problems, I spent the two last weeks making tests with
point-in-time-recovery and everything went fine.

>
>
>
> Thanks a lot again.
>
>
> Thanks a lot for helping! Guillaume
>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> <mailto:adrian(dot)klaver(at)aklaver(dot)__com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>>
>
>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jim Nasby 2015-02-09 20:42:21 Re: Question on session_replication_role
Previous Message Jeroen Ooms 2015-02-09 19:53:56 Building proper static library libpq.a