Re: Run statements before pg_dump in same transaction?

From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Run statements before pg_dump in same transaction?
Date: 2017-03-24 13:18:40
Message-ID: 8a622e83cd86f2d35a79f4f8d5c65cd8@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

François Beausoleil asked:
> To that end, we'd like to anonymize the data before it leaves the database server.
>
> One solution we thought of would be to run statements prior to pg_dump, but within
> the same transaction, something like this:
>
> BEGIN;
> UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...;
> -- launch pg_dump as usual, ensuring a ROLLBACK at the end
...
> Is there a ready-made solution for this?

No - at least not with generating a dump and scrubbing *before* the data comes
out. Some other ideas:

* Periodically do a full dump to another database under your control, sanitize the
data, and make all dev dumps come from *that* database. Process roughly becomes:

* pg_dump herokudb | psql dev_temp
* <sanitize dev_temp>
* drop existing dev_old; rename devdb to dev_old; rename dev_temp to devdb
* Devs can pg_dump devdb at will

That still moves your sensitive data to another server though, even temporarily.
Another approach is to use the -T flag of pg_dump to exclude certain tables.
Make modified copies of them on the server, then rename them after the dump
(or simply put them in a new namespace):

* (create a dev.users identical (including indexes, etc.) to public.users)
* truncate table dev.users;
* insert into dev.users select * from public.users;
* update dev.users set email = 'dev' + || ...etc. <e.g. sanitize data>
* pg_dump -d <herokus DATABASE_URL> -T public.users > devs_use_this.pg

Then dev could do: set schema = dev, public;

Or you could simply move the sanitized table back:

alter table dev.users set schema public;

If you are going to rename, it may be simpler to not make an identical copy
of the affected tables (i.e. with indexes) but just a data-only copy:

create table dev.users as select * from public.users;
<sanitize dev.users>
pg_dump herokudb --schema-only | psql devdb
pg_dump herokudb --data-only -T public.users | psql devdb
psql devdb -c 'insert into public.users select * from dev.users'

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201703240911
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAljVHHsACgkQvJuQZxSWSshUbgCg7TzCkAzT4wKoKd5/2rruzLte
TJcAoI7AvGdGzlNp5b3N+LFJ9DWIZ8/C
=7heB
-----END PGP SIGNATURE-----

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2017-03-24 14:02:43 Re: postgres source code function "internal_ping" may be not right in some conditions
Previous Message lin 2017-03-24 09:03:54 postgres source code function "internal_ping" may be not right in some conditions