pg_anonymize, a new extension for simple and transparent data anonymization

From: Julien Rouhaud via PostgreSQL Announce <announce-noreply(at)postgresql(dot)org>
To: PostgreSQL Announce <pgsql-announce(at)lists(dot)postgresql(dot)org>
Subject: pg_anonymize, a new extension for simple and transparent data anonymization
Date: 2023-03-09 17:26:13
Message-ID: 167838277397.1803488.6057055767941299255@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-announce

Taipei, Taiwan - March 9, 2023

pg_anonymize, a new extension for simple and transparent data anonymization
=======================================================================

I'm pleased to announce the beta version of
[pg_anonymize](https://github.com/rjuju/pg_anonymize).

pg_anonymize is a PostgreSQL extension that provides simple, robust and
transparent infrastructure for data anonymization. Its goal is to ensure that
anyone connected with an anonymized role will only ever see the anonymized
version of the data without any restriction on the client used (could be psql,
pg_dump, your own application or even other tools like pg_sample...) or the
number of schemas and relations.

Anonymization is done using a declarative approach, relying on SECURITY LABELs:
you only have to declare an expression for each column that should be
anonymized that will dynamically perform the anonymization. Any valid
expression yielding the correct type can be used, so you can write them in
plain sql, plpgsql or any other procedural language of your choice.

This extension is compatible with PostgreSQL 10 and above. For more details on
its usage please look at the [usage](#Usage) section below or the
[documentation](https://github.com/rjuju/pg_anonymize/blob/main/README.md).

Feel free to [open issue](https://github.com/rjuju/pg_anonymize/issues) if you
encounter any problem or want to ask for new features.

Configuration
-------------

pg_anonymize provides the following configuration options:

- **pg_anonymize.enabled** (bool): allows to globally enable or disable
pg_anonymize. The default value is **on**.

- **pg_anonymize.check_labels** (bool): perform sanity checks (expression
validity, read-only, returned type and lack of SQL injection) on the defined
expression when declaring security labels. The default value is **on**.

NOTE: even if **pg_anonymize.check_labels** is disabled, pg_anonymize will
still check that the defined expression doesn't contain any SQL injection.

Usage
-----

pg_anonymize must be loaded before being able to use it. There are multiple
ways to do it. Usually, only a few roles should require data anonymization, so
the recommended way is to only load the extension for such roles. For
instance, assuming the role **alice** should be used:

```
ALTER ROLE alice SET session_preload_libraries = 'pg_anonymize';
```

NOTE: only sessions opened by alice **after** this command has been
successfully run will load pg_anonymize.

You can alternatively load it explicitly, for instance:

```
LOAD 'pg_anonymize';
```

NOTE: LOAD requires superuser privileges.

You then need to declare the wanted role(s) as needing anonymized data. This
is done adding a SECURITY LABEL **anonymize** on the target role(s). For
instance:

```
-- pg_anonymize need to be loaded before declaring SECURITY LABEL
LOAD 'pg_anonymize';
SECURITY LABEL FOR pg_anonymize ON ROLE alice IS 'anonymize';
```

NOTE: declaring a SECURITY LABEL on a role requires CREATEROLE privilege.

You can then declare how to anonymize each column with SECURITY LABELS,
defining an expression to replace the actual content.

For instance, assuming a simplistic customer table:

```
CREATE TABLE public.customer(id integer,
first_name text,
last_name text,
birthday date,
phone_number text);

INSERT INTO public.customer VALUES (1, 'Nice', 'Customer', '1970-03-04', '+886 1234 5678');

GRANT SELECT ON TABLE public.customer TO alice;
```

Let's anonymize the last name, birthday and phone number:

```
SECURITY LABEL FOR pg_anonymize ON COLUMN public.customer.last_name
IS $$substr(last_name, 1, 1) || '*****'$$;
SECURITY LABEL FOR pg_anonymize ON COLUMN public.customer.birthday
IS $$date_trunc('year', birthday)::date$$;
SECURITY LABEL FOR pg_anonymize ON COLUMN public.customer.phone_number
IS $$regexp_replace(phone_number, '\d', 'X', 'g')$$;
```

NOTE: declaring a SECURITY LABEL on a column requires to be owner of the
underying relation.

The **alice** role will now automatically see anonymized data. For instance:

```
-- current role sees the normal data
=# SELECT * FROM public.customer;
id | first_name | last_name | birthday | phone_number
----+------------+-----------+------------+----------------
1 | Nice | Customer | 1970-03-04 | +886 1234 5678
(1 row)

-- but alice will see anonymized data
=# \c - alice
You are now connected to database "rjuju" as user "alice".

=> SELECT * FROM public.customer;
id | first_name | last_name | birthday | phone_number
----+------------+-----------+------------+----------------
1 | Nice | C***** | 1970-01-01 | +XXX XXXX XXXX
(1 row)

-- pg_dump will also see anonymized data
$ pg_dump -U alice -t public.customer -a rjuju | grep "COPY" -A2
COPY public.customer (id, first_name, last_name, birthday, phone_number) FROM stdin;
1 Nice C***** 1970-01-01 +XXX XXXX XXXX
\.
```

Browse pgsql-announce by date

  From Date Subject
Next Message pgAdmin Development Team via PostgreSQL Announce 2023-03-09 17:26:26 pgAdmin 4 v6.21 Released
Previous Message Swiss PostgreSQL Users Group via PostgreSQL Announce 2023-03-07 08:38:20 Swiss PGDay 2023 Announcement