Re: FW: Setting up of PITR system.

From: "Rajesh Kumar Mallah" <mallah(dot)rajesh(at)gmail(dot)com>
To: "Grega Bremec" <gregab(at)p0f(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: FW: Setting up of PITR system.
Date: 2006-04-03 18:29:53
Message-ID: a97c77030604031129m6e014adcu52df9eee26510ea1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

> | Do you see any problem in the current approach ?
> | i have seen it working fine till now.
>
> I do, to be honest. The WAL location counter accounts for 4294967295
> positions and while I'm certain that's WAY more than the average number
> of transactions that go into a WAL, quite a number of small ones can
> certainly happen before a WAL is rolled over, and until then, you're
> dealing with the same log file.
>
> If two backups happen in that period of time for whatever reason, you're
> going to have a false positive by looking into ${WAL_ARCHIVE} and
> searching just for the WAL name, so including the location in the search
> of a WAL fragment is certainly necessary. Infact, going purely by
> chance, the probability of hitting the same location in two different
> log files in two subsequent backups is much lower than hitting the same
> WAL twice.

Dear Grega,

sincere thanks for your time,

The current wal log is not being removed from the wal archive area
in any case. The files less than the current ones are being rm'ed.

I am sorry i am not able to get your apprehension. But i shall
surely try harder to understand your point.

anyways have a look at the current script with following improvements.

1. Do some sanity checks about folder existance and permissions
2. accepts 3 mandatory args now ,
PGDATADIR , BACKUP DUMP FOLDER and WAL ARCHIVE AREA
3. use readlink -f to probe all the directories to be included in basebackup
4. Attempt to probe psql and rsync in system and bail out if not found.

Regarding :

> | 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR
>
> Perhaps moving the old log files into a father backup directory and
> having them stick around for a period of time before removing them isn't
> a bad idea either, just in case something goes wrong with your latest
> backup. You could go about that using find as well; see the -ctime
> predicate in find(1).

the old log files without the base backup are not useful. since
rsync is being used to optimise the copying by overwriting the
base backup everytime, i dont thing preserving the old files
makes sense. Had it been and non overwritng backup the files
would have made sense.

---------------- BEGIN -------------------------------------------------
#!/bin/bash

##################################################
# it does following
# 1. checks existance and permission of imp folders.
# 2. takes base backup to a destined folder by rsync
# 3. removes unwanted archived log files.
##################################################

if [ $# -ne 3 ]
then
echo "Usage: $0 <DATADIR> <BACKUP DIRECTORY> <WAL ARCHIVE DIRECTORY>"
exit 1
fi
DATADIR_IN=$1
BACKUPFOLDER=$2
WAL_ARCHIVE=$3

if [ -z $BACKUPFOLDER ] || [ ! -d $BACKUPFOLDER ] || [ ! -w $BACKUPFOLDER ]
then
echo "Sorry base backup folder $BACKUPFOLDER does not exists
or is not writable or is not specified!"
exit 1
fi
if [ -z $WAL_ARCHIVE ] || [ ! -d $WAL_ARCHIVE ] || [ ! -w $WAL_ARCHIVE ]
then
echo "Sorry WAL archive folder $WAL_ARCHIVE does not exists or
is not writable or is not specified!"
exit 1
fi
if [ -L $DATADIR_IN ]
then
DATADIR=`readlink -f $DATADIR_IN`
echo "Using $DATADIR instead of $DATADIR_IN as $DATADIR_IN is a link"
else
DATADIR=$DATADIR_IN
fi

# get all tablespaces from $DATADIR/pg_tblspc
DIRS=(`find $DATADIR/pg_tblspc -type l -exec readlink -f {} \;`)
# append DATADIR to it
DIRS=( "${DIRS[(at)]}" $DATADIR)

CTR=0
echo "Script shall backup following folders"
while [ -n "${DIRS[${CTR}]}" ]; do
echo "${DIRS[${CTR}]}"
CTR=$((CTR + 1))
done
unset CTR

PSQL_BIN=`which psql` || /usr/local/pgsql/bin/psql
RSYNC_BIN=`which rsync` || /usr/bin/rsync

for PROG in $PSQL_BIN $RSYNC_BIN ; do
if [ ! -f $PROG ] || [ ! -x $PROG ]
then
echo "Sorry $PROG does not exists or is not executable by you"
echo "Please set env variable PATH to include psql and rsync"
exit 1
else
echo "Using $PROG"
fi
done

RSYNC_OPTS="--delete-after -a --exclude pg_xlog"
RSYNC="$RSYNC_BIN $RSYNC_OPTS"
PSQL=$PSQL_BIN

today=`date +%d-%m-%Y-%H-%M-%S`
label=base_backup_${today}

echo "Executing pg_start_backup with label $label in server ... "

# get the checkpoint at which backup starts
# the .backup files seems to be bearing this string in it.

CP=`$PSQL -q -Upostgres -d template1 -c "SELECT
pg_start_backup('$label');" -P tuples_only -P format=unaligned`

RVAL=$?
if [ $RVAL -ne 0 ]
then
echo "PSQL pg_start_backup failed:$CP"
exit 1;
fi
echo "pg_start_backup executed successfully"

# read the backup_label file in pgdatadir and get the name of start wal file
# below is example content.
#START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
#CHECKPOINT LOCATION: E/A92939C
#START TIME: 2006-04-01 14:36:48 IST
#LABEL: base_backup_01-04-2006-14-36-45

BACKUP_LABEL=$DATADIR/backup_label # assuming pg_start_backup
immediate puts backup_label in
# pgdatadir on finish.
START_LINE=`grep -i "START WAL LOCATION" $BACKUP_LABEL` # get the
like containing START WAL LOCATION
START_LINE=${START_LINE/#START*file /} # strip something like 'START
WAL LOCATION: E/A9145E4 (file ' from begin.
START_LINE=${START_LINE/%)/} # strip ')' from end.

# REF_FILE_NUM is something like 000000010000000A00000068
REF_FILE_NUM=$START_LINE

echo "Content of $BACKUP_LABEL"
echo "------------- begin -----------"
cat $BACKUP_LABEL
echo "------------- end -----------"
echo "Read Start Wal as : $REF_FILE_NUM"

echo "RSYNC begins.."

# rsync each of the folders to the backup folder.
CTR=0
while [ -n "${DIRS[${CTR}]}" ]; do
echo "Syncing ${DIRS[${CTR}]}..."
echo "Executing:${RSYNC} ${DIRS[${CTR}]} ${BACKUPFOLDER}"
time ${RSYNC} ${DIRS[${CTR}]} ${BACKUPFOLDER}
RVAL=$?
echo "Sync finished with exit status ${RVAL}"
if [[ ${RVAL} -eq 0 || ${RVAL} -eq 23 ]]; then
echo "Rsync success"
else
echo "Rsync failed"
$PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
exit 1
fi
CTR=$((CTR + 1))
done
unset CTR

echo "Executing pg_stop_backup in server ... "
$PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
if [ $? -ne 0 ]
then
echo "PSQL pg_stop_backup failed"
exit 1;
fi
echo "pg_stop_backup done successfully"

echo "REF_FILE_NUM=$REF_FILE_NUM"

# iterate list of files in the WAL_ARCHIVE folder
for i in `ls -1 $WAL_ARCHIVE` ;
do
# $i is :000000010000000A0000005D.bz2 eg
# get first 24 chars in filename
FILE_NUM=${i:0:24}

# compare if the number is less than the reference
# here string comparison is being used.
if [[ $FILE_NUM < $REF_FILE_NUM ]]
then
echo "$FILE_NUM [ $i ] removed"
rm -f $WAL_ARCHIVE/$i
else
echo "$FILE_NUM [ $i ] not removed"
fi
done
------------------ END -----------------------------------------------------

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Milen Dzhumerov 2006-04-03 21:28:31 Permissions
Previous Message Alvaro Herrera 2006-04-03 17:36:44 Re: Bloated pg_shdepend_depender_index