Re: Anonymized database dumps

From: Marko Kreen <markokr(at)gmail(dot)com>
To: hari(dot)fuchs(at)gmail(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Anonymized database dumps
Date: 2012-03-19 09:41:16
Message-ID: 20120319094116.GA14674@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Mar 19, 2012 at 10:12:01AM +0100, hari(dot)fuchs(at)gmail(dot)com wrote:
> Janning Vygen <vygen(at)kicktipp(dot)de> writes:
> > pgcrypto does not work for this scenario as far as i know.
> >
> > pgcrypto enables me to encrypt my data and let only a user with the
> > right password (or key or whatever) decrypt it, right? So if i run it
> > in a test environment without this password the application is broken.
> >
> > I still want to use these table columns in my test environment but
> > instead of real email addresses i want addresses like
> > random_number(at)example(dot)org(dot)
> >
> > You might be right that it is a good idea to additional encrypt this data.
>
> Maybe you could change your application so that it doesn't access the
> critical tables directly and instead define views for them which, based
> on current_user, either do decryption or return randim strings.

Encryption is wrong tool for "anonymization".

The right tool is hmac() which gives you one-way hash that
is protected by key, which means other side can't even
calcutate the hashes unless they have same key.

You can calculate it with pgcrypto when dumping,
or later post-processing the dumps.

But it produces random values, if you need something
realistic-looking you need custom mapping logic.

--
marko

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Florent THOMAS 2012-03-19 09:49:52 Re: Multi server query
Previous Message John R Pierce 2012-03-19 09:14:17 Re: Multi server query