Re: Possible solution for masking chosen columns when using pg_dump

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Олег Целебровский <oleg_tselebrovskiy(at)mail(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Possible solution for masking chosen columns when using pg_dump
Date: 2022-10-03 15:44:53
Message-ID: 20221003154453.sfhl54gb3ccxpscp@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote:
>
> Hello, here's my take on masking data when using pg_dump
>  
> The main idea is using PostgreSQL functions to replace data during a SELECT.
> When table data is dumped SELECT a,b,c,d ... from ... query is generated, the columns that are marked for masking are replaced with result of functions on those columns
> Example: columns name, count are to be masked, so the query will look as such: SELECT id, mask_text(name), mask_int(count), date from ...
>  
> So about the interface: I added 2 more command-line options: 
>  
> --mask-columns, which specifies what columns from what tables will be masked 
>     usage example:
>             --mask-columns "t1.name, t2.description" - both columns will be masked with the same corresponding function
>             or --mask-columns name - ALL columns with name "name" from all dumped tables will be masked with correspoding function
>  
> --mask-function, which specifies what functions will mask data
>     usage example:
>             --mask-function mask_int - corresponding columns will be masked with function named "mask_int" from default schema (public)
>             or --mask-function my_schema.mask_varchar - same as above but with specified schema where the function is stored
>             or --mask-function somedir/filename - the function is "defined" here - more on the structure below

FTR I wrote an extension POC [1] last weekend that does that but on the backend
side. The main advantage is that it's working with any existing versions of
pg_dump (or any client relying on COPY or even plain interactive SQL
statements), and that the DBA can force a dedicated role to only get a masked
dump, even if they forgot to ask for it.

I only had a quick look at your patch but it seems that you left some todo in
russian, which isn't helpful at least to me.

[1] https://github.com/rjuju/pg_anonymize

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-10-03 16:08:47 Re: Question: test "aggregates" failed in 32-bit machine
Previous Message Олег Целебровский 2022-10-03 15:30:17 Possible solution for masking chosen columns when using pg_dump