From: | "Greg Sabino Mullane" <greg(at)turnstep(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Run statements before pg_dump in same transaction? |
Date: | 2017-03-24 13:18:40 |
Message-ID: | 8a622e83cd86f2d35a79f4f8d5c65cd8@biglumber.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
François Beausoleil asked:
> To that end, we'd like to anonymize the data before it leaves the database server.
>
> One solution we thought of would be to run statements prior to pg_dump, but within
> the same transaction, something like this:
>
> BEGIN;
> UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...;
> -- launch pg_dump as usual, ensuring a ROLLBACK at the end
...
> Is there a ready-made solution for this?
No - at least not with generating a dump and scrubbing *before* the data comes
out. Some other ideas:
* Periodically do a full dump to another database under your control, sanitize the
data, and make all dev dumps come from *that* database. Process roughly becomes:
* pg_dump herokudb | psql dev_temp
* <sanitize dev_temp>
* drop existing dev_old; rename devdb to dev_old; rename dev_temp to devdb
* Devs can pg_dump devdb at will
That still moves your sensitive data to another server though, even temporarily.
Another approach is to use the -T flag of pg_dump to exclude certain tables.
Make modified copies of them on the server, then rename them after the dump
(or simply put them in a new namespace):
* (create a dev.users identical (including indexes, etc.) to public.users)
* truncate table dev.users;
* insert into dev.users select * from public.users;
* update dev.users set email = 'dev' + || ...etc. <e.g. sanitize data>
* pg_dump -d <herokus DATABASE_URL> -T public.users > devs_use_this.pg
Then dev could do: set schema = dev, public;
Or you could simply move the sanitized table back:
alter table dev.users set schema public;
If you are going to rename, it may be simpler to not make an identical copy
of the affected tables (i.e. with indexes) but just a data-only copy:
create table dev.users as select * from public.users;
<sanitize dev.users>
pg_dump herokudb --schema-only | psql devdb
pg_dump herokudb --data-only -T public.users | psql devdb
psql devdb -c 'insert into public.users select * from dev.users'
- --
Greg Sabino Mullane greg(at)turnstep(dot)com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201703240911
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAljVHHsACgkQvJuQZxSWSshUbgCg7TzCkAzT4wKoKd5/2rruzLte
TJcAoI7AvGdGzlNp5b3N+LFJ9DWIZ8/C
=7heB
-----END PGP SIGNATURE-----
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-03-24 14:02:43 | Re: postgres source code function "internal_ping" may be not right in some conditions |
Previous Message | lin | 2017-03-24 09:03:54 | postgres source code function "internal_ping" may be not right in some conditions |