Re: Option for pg_dump to dump tables in clustered index order

From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Timothy Garnett <tgarnett(at)panjiva(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Option for pg_dump to dump tables in clustered index order
Date: 2012-02-23 15:14:07
Message-ID: CAFNqd5XFdM3Qh98MqLRjiO0EcpPnRBcKA=e=weyM5zbA1ii8Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 6:17 PM, Timothy Garnett <tgarnett(at)panjiva(dot)com> wrote:
> I wanted to gauge the interest in adding an option for this to pg_dump.

I was thinking about an application for much the same feature.

Consider the case where you have a relatively small database such as
the accounting records for a not-hugely-active business.

And you'd like to handle backups via checking them into an SCM repository.

Thus...

#!/bin/sh
cd $HOME/GitBackup/Databases
pg_dump -h dbserver -p 5432 accounting > accounting.sql
git add accounting.sql
git commit -m "Latest backup" accounting.sql

If the database's tables have gotten clustered, then the order of data
will tend to be consistent, and differences between versions of
"accounting.sql" will generally represent the actual differences.

If, on the other hand, tables are not clustered, then dumps will find
tuples ordered in rather less predictable fashions, and the backups
will have more differences indicated.

I was thinking about writing a script to run CLUSTER before doing
backups. For that step to be part of pg_dump is certainly an
interesting idea.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-23 15:17:28 Re: incompatible pointer types with newer zlib
Previous Message Simon Riggs 2012-02-23 15:12:13 Re: foreign key locks, 2nd attempt