streaming replication and data file consistency

From: Matt Savona <matt(dot)savona(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: streaming replication and data file consistency
Date: 2012-10-22 14:31:17
Message-ID: CAKuu0C=uUYZUzxP6KcLNrx3KG0eKQOSg_SEMMLWpdaEufp=1tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I am currently running Postgresql 9.2.1 with streaming replication: one
primary, one standby. Once an hour I have a job which compares
pg_current_xlog_location on the primary against
pg_last_xlog_replay_location on the standby to ensure the standby is not
lagging too far behind the primary. So far everything is working great.

I noticed, however, that despite the fact that the cluster is consistently
in sync the md5sums and modified timestamps on many of my data files
differ. For example:

PRIMARY

# stat pgsql/data/base/16385/17600
File: `pgsql/data/base/16385/17600'
Size: 3112960 Blocks: 6080 IO Block: 4096 regular file
Device: fd02h/64770d Inode: 39167976 Links: 1
Access: (0600/-rw-------) Uid: ( 26/postgres) Gid: ( 26/postgres)
Access: 2012-10-22 10:05:29.314607927 -0400
Modify: 2012-10-22 09:48:03.770209170 -0400
Change: 2012-10-22 09:48:03.770209170 -0400

# md5sum pgsql/data/base/16385/17600
5fb7909ea14ab7aa9636b31df5679bd4 pgsql/data/base/16385/17600

STANDBY

# stat pgsql/data/base/16385/17600
File: `pgsql/data/base/16385/17600'
Size: 3112960 Blocks: 6080 IO Block: 4096 regular file
Device: fd02h/64770d Inode: 134229639 Links: 1
Access: (0600/-rw-------) Uid: ( 26/postgres) Gid: ( 26/postgres)
Access: 2012-10-22 10:05:25.361235742 -0400
Modify: 2012-10-22 09:50:29.674567827 -0400
Change: 2012-10-22 09:50:29.674567827 -0400

# md5sum pgsql/data/base/16385/17600
9deeb7b446c12fbb5745d4d282113d3c pgsql/data/base/16385/17600

The reason I am curious about this is because when both systems are healthy
and I wish to swap primaries, I will bring the primary and the standby down
and do a full rsync of the data/ directory from old primary to new primary.
However, because the data files are different, the rsync run takes a very
long time.

My questions are:
1) While the xlog location between primary and standby remains
consistent, are the data files, internally, structured differently between
primary and standby?
2) Is this expected, and if so, what causes them to diverge?

Thanks in advance for helping me understand this behavior!

- Matt

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2012-10-22 14:34:18 Re: Streaming replication failed to start scenarios
Previous Message chinnaobi 2012-10-22 13:57:30 Re: Streaming replication failed to start scenarios