Re: archive falling behind

From: German Becker <german(dot)becker(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Strahinja Kustudic <strahinjak(at)nordeus(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: archive falling behind
Date: 2013-04-26 19:29:47
Message-ID: CALyjCLu-siDm5FH12XhkhmVr746p6HX5+gt3t7OokDL3OG56nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Actually this seems like a very strange filesystem /hw problem. The wal
segments keep "changing" even after I stoped the database and noone is
supposly accesing it:

root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
6fd36722641dc2857bb950437c052fa3 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
26e9c82d123513528824bdf9815dbd2b 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
649111a77ac7ec26f4ddeed18e039faa 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# lsof
000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
ac9ba79e672bc5df2c126044e9054ff7 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
8956e59a4542599e8ded7450b7cab5a6 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
514dccfe7f5df4c55747e14e6c13268f 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
f2c53795afcbc7c150443a3cdd3550bb 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
79687effd43c0e51a127a677e14a815c 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
51b66cd72ed3fb11aa57fab244696e0f 000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# md5sum
000000010000001000000049
bf1a2ec5847c40a0b9200769cff601e4 000000010000001000000049

root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog# lsof
000000010000001000000049
root(at)lemur:/var/lib/postgresql/9.1/main/pg_xlog#

Maybe this is off-topic but has anyone seen something like this? I'm on
Ubuntu 12.04. This is the hard drive mount line (the hard drive is used
exclusivly for the pg_xlog directory):

/dev/sdb1 on /storage/sdb1 type ext4 (rw,noatime,errors=remount-ro)

Thanks!

On Fri, Apr 26, 2013 at 4:25 PM, German Becker <german(dot)becker(at)gmail(dot)com>wrote:

> Hi I have reverted to cp as archive command, but know under heavy load (>
> 150 WAL segments in a minute) it happens that some wal segments gets
> corrupted:
>
> postgres(at)lemur:~/9.1/main/pg_xlog$ md5sum 000000010000001000000049
> f1906d2745224430f811496df466203f 000000010000001000000049
> postgres(at)lemur:~/9.1/main/pg_xlog$ md5sum
> ~/backups/wal/000000010000001000000049
> 7e73fe759e41e427497360a815f9d3e1
> /var/lib/postgresql/backups/wal/000000010000001000000049
>
>
>
>
>
> On Fri, Apr 26, 2013 at 10:55 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>wrote:
>
>> German Becker wrote:
>> > Here is the archive part of the config:
>> >
>> > archive_mode = on # allows archiving to be done
>> > # (change requires restart)
>> > archive_command = '/var/lib/postgresql/scripts/archive_copy.sh %p %f'
>> # command to use to
>> > archive a logfile segment
>> > #archive_timeout = 0 # force a logfile segment switch after
>> this
>> > # number of seconds; 0 disables
>>
>> So the problem might be in that script.
>>
>> > The archive coommand makes a local copy and then it copies to the
>> backup server via ssh. Both copies
>> > are md5-checked and retried up to 3 times in case of failure.
>>
>> archive_command should not retry the operation, but rather
>> return a non-zero return code.
>>
>> > I have seen under heavy load that some WALs are skipped, some have less
>> size, some are corrupted (i,e,
>> > the loop fails 3 times).
>> > I'm not sure about the return value (checking it). What is the expected
>> behaviour of the archiver?
>> > Will it retry de archive if archive command returns differnt than 0?
>> Will it retain the WAL segment
>> > until it is succesfuly archived?
>>
>> See
>> http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
>>
>> archive_command should exit with zero only if the
>> WAL segment was archived successfully.
>> PostgreSQL will retry and retain the WAL segment until
>> archival succeeds.
>>
>> Yours,
>> Laurenz Albe
>>
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message ALEXANDER JOSE 2013-04-27 14:32:37 Postgresql Courses
Previous Message German Becker 2013-04-26 19:25:04 Re: archive falling behind