Re: naming of wal-archives

From: Neil Worden <nworden1234(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: naming of wal-archives
Date: 2013-01-31 09:48:03
Message-ID: CADZZMN-9S3G+iAGTW3hhe3RP-nr_LRCY4jHrx8qi4P=uU0DeNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Btw, ps shows:

postgres(at)darkblue:/data/pgdata/pg_xlog$ ps aux | grep post
postgres 11496 0.1 0.9 161018232 3696076 ? S Jan29 2:49 postmaster
-i -D /data/pgdata
postgres 11499 0.0 1.6 161097088 6450616 ? Ss Jan29 1:39 postgres:
checkpointer process
postgres 11500 0.0 0.3 161095036 1414612 ? Ss Jan29 0:12 postgres:
writer process
postgres 11501 0.0 0.0 161095036 17328 ? Ss Jan29 0:05 postgres:
wal writer process
postgres 11502 0.0 0.0 161096724 3112 ? Ss Jan29 0:11 postgres:
autovacuum launcher process
postgres 11503 0.0 0.0 20136 884 ? Ss Jan29 0:10 postgres:
archiver process last was 000000010000006E00000034
postgres 11504 0.0 0.0 20816 1412 ? Ss Jan29 0:53 postgres:
stats collector process
postgres 11507 0.0 0.0 161096264 2652 ? Ss Jan29 1:08 postgres:
wal sender process postgres 192.168.30.65(45640) streaming 8E/5544E650
postgres 11864 0.0 0.0 161096136 2656 ? Ss Jan29 0:51 postgres:
wal sender process postgres 192.168.10.95(37378) streaming 8E/5544E650
...

The archiver process says "last was 000000010000006E00000034" and when i
look into my wal-archive-directory i see:

-rw------- 1 postgres postgres 16777216 Jan 31 10:24
000000010000006E00000033
-rw------- 1 postgres postgres 16777216 Jan 31 10:24
000000010000006E00000034
-rw------- 1 postgres postgres 16777216 Jan 29 16:03
000000010000008C0000008E
-rw------- 1 postgres postgres 16777216 Jan 29 16:32
000000010000008C0000008F

The 6E..34 file was just written by the archiver process. But further down
at the same time this file was written:

...
-rw------- 1 postgres postgres 16777216 Jan 31 10:24
000000010000008E00000054

which seems to match the position of the streaming wal-senders .

Any ideas ?

Thanks, Neil.

2013/1/31 Neil Worden <nworden1234(at)gmail(dot)com>

> >>> If your command does overwrite, then the server currently emitting the
> >>> 8D files will become unrecoverable once those files start getting
> >>> overwritten. If it refuses to overwrite, but returns a zero status,
> >>> then the server currently emitting 6D would become unrecoverable once
> >>> it reaches 8D and its "archived" files are not actually being archived
> >>> but are getting deleted from the local pg_xlog anyway.
> >>
> >>
> >> Would it not be easier to archive the different servers to different
> >> directories and eliminate the possibility of name collision between
> servers?
>
> >Easier? I would say that that is the only sane way of doing it. I
> >was pointing out the consequences of messing it up. A proper
> >archive_command will save you from some self-inflicted disasters, but
> >that does not mean I'm recommending that you should invite those
> >disasters on yourself.
>
> >If the original author is in a production environment, he desperately
> >needs to figure out what is going on, especially so if archive_command
> >is not tested and verified to obey its contract.
>
> >Cheers,
>
> >Jeff
>
> Thanks for your answers.
>
> Yes, we are in a production environment and there are definitely no two
> masters writing to the wal-archive directory, at least none that i am aware
> of. And i can assure you that we are not intentionally working on inviting
> disasters either :-).
>
> Point is, the ..6D.. line started showing up exactly after i set up the
> archive-command (which is btw: archive_command = 'test ! -f
> /data/backup/walbackup/%f && cp %p /data/backup/walbackup/%f', running on
> Ubuntu Linux Server)
>
> The situation is as follows:
>
> All concerned machines are running 9.2.2 64-bit on Ubuntu Linux Server
> 12.10, installed from source, all following exactly the same procedure. We
> have a hot-standby running to a different location over a rather thin line
> running since version 9.1 came out. That worked
> flawlessly, we only were bitten by autovacuums to prevent XID wraparounds
> that generated relatively high wal-volume and we
> were not sure whether the network connection could keep up with it before
> deleting wal-files. Since we had to physically transfer a backup once for
> other reasons, we set wal_keep_segments to 8192 to have enough
> fallback-time.
>
> The total size of the database is currently at 313 GB and we are in the
> process of rethinking our backup/emergency-strategy (currently daily full
> dumps + hot-standby for read-only queries in the other location + one
> hot-standby in the same office).
>
> So we set up another machine, intended to become a another hot_standby
> later, but for now just use it to experiment with pg_receivexlog. Since our
> current backup-strategy does not allow for PIT-recovery and we have been
> thinking about wal-archiving all the time, we decided to implement it and
> thats what we did. Now we have the two "lines" in the archive-directory.
>
> Could the the high number of wal_keep_segments have an impact ?
> Does the fact that there already were a lot of existing wal-files when i
> set up archiving and the archive-command have an impact ?
>
> Jeff, you wrote:
>
> >> And how would i restore the needed file names for recovery
> >> if i decide to keep one base-backup und then a very long chain of
> wal-files
> >> ?
>
> >There should be no need for that.
>
> When you said there would be no need for that, did you mean restoring the
> files for recovery or keeping a base-backup and the chain of wal-files ?
>
> I understand that the archive-command is responsible for not overwriting
> wal-files. But if that situation occurs, and if i understand you correctly
> it will, what do i do ?
> If the wal-archive files will be overwritten at some point in time, how is
> it possible to only have one single base-backup at time-point t and from
> then on only store the following wal-files ( however inconvenient that may
> be ) to be able to restore to any point in time after time t ?
>
> Thanks,
>
> Neil
>
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Neil Worden 2013-01-31 09:53:27 Re: naming of wal-archives
Previous Message Alexander Farber 2013-01-31 09:01:14 Re: Optimizing select count query which often takes over 10 seconds