Quick Links

Re: Transparent column encryption

From:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Transparent column encryption
Date:	2022-06-28 23:29:57
Message-ID:	48a9f2c2-4a57-27d8-7c53-16a23a01014e@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Here is a new version of this patch. See also the original description
quoted below. I have done a significant amount of work on this over the
last few months. Some important news include:

- The cryptography has been improved. It now uses an AEAD scheme, and
for deterministic encryption a proper SIV construction.

- The OpenSSL-specific parts have been moved to a separate file in
libpq. Non-OpenSSL builds compile and work (without functionality, of
course).

- libpq handles multiple CEKs and CMKs, including changing keys on the fly.

- libpq supports a mode to force encryption of certain values.

- libpq supports a flexible configuration system for looking up CMKs,
including support for external key management systems.

- psql has a new \gencr command that allows passing in bind parameters
for (potential) encryption.

- There is some more pg_dump and psql support.

- The new data types for storing encrypted data have been renamed for
clarity.

- Various changes to the protocol compared to the previous patch.

- The patch contains full documentation of the protocol changes,
glossary entries, and more new documentation.

The major pieces that are still missing are:

- DDL support for registering keys

- Protocol versioning or feature flags

Other than that it's pretty complete in my mind.

For interested reviewers, I have organized the patch so that you can
start reading it top to bottom: The documentation comes first, then the
tests, then the code changes. Even some feedback on the first or first
two aspects would be valuable to me.

Old news follows:

On 03.12.21 22:32, Peter Eisentraut wrote:
> I want to present my proof-of-concept patch for the transparent column
> encryption feature. (Some might also think of it as automatic
> client-side encryption or similar, but I like my name.) This feature
> enables the {automatic,transparent} encryption and decryption of
> particular columns in the client. The data for those columns then
> only ever appears in ciphertext on the server, so it is protected from
> the "prying eyes" of DBAs, sysadmins, cloud operators, etc. The
> canonical use case for this feature is storing credit card numbers
> encrypted, in accordance with PCI DSS, as well as similar situations
> involving social security numbers etc. Of course, you can't do any
> computations with encrypted values on the server, but for these use
> cases, that is not necessary. This feature does support deterministic
> encryption as an alternative to the default randomized encryption, so
> in that mode you can do equality lookups, at the cost of some
> security.
>
> This functionality also exists in other SQL database products, so the
> overall concepts weren't invented by me by any means.
>
> Also, this feature has nothing to do with the on-disk encryption
> feature being contemplated in parallel. Both can exist independently.
>
> The attached patch has all the necessary pieces in place to make this
> work, so you can have an idea how the overall system works. It
> contains some documentation and tests to help illustrate the
> functionality. But it's missing the remaining 90% of the work,
> including additional DDL support, error handling, robust memory
> management, protocol versioning, forward and backward compatibility,
> pg_dump support, psql \d support, refinement of the cryptography, and
> so on. But I think obvious solutions exist to all of those things, so
> it isn't that interesting to focus on them for now.
>
> ------
>
> Now to the explanation of how it works.
>
> You declare a column as encrypted in a CREATE TABLE statement. The
> column value is encrypted by a symmetric key called the column
> encryption key (CEK). The CEK is a catalog object. The CEK key
> material is in turn encrypted by an assymmetric key called the column
> master key (CMK). The CMK is not stored in the database but somewhere
> where the client can get to it, for example in a file or in a key
> management system. When a server sends rows containing encrypted
> column values to the client, it first sends the required CMK and CEK
> information (new protocol messages), which the client needs to record.
> Then, the client can use this information to automatically decrypt the
> incoming row data and forward it in plaintext to the application.
>
> For the CMKs, the catalog object specifies a "provider" and generic
> options. Right now, libpq has a "file" provider hardcoded, and it
> takes a "filename" option. Via some mechanism to be determined,
> additional providers could be loaded and then talk to key management
> systems via http or whatever. I have left some comments in the libpq
> code where the hook points for this could be.
>
> The general idea would be for an application to have one CMK per area
> of secret stuff, for example, for credit card data. The CMK can be
> rotated: each CEK can be represented multiple times in the database,
> encrypted by a different CMK. (The CEK can't be rotated easily, since
> that would require reading out all the data from a table/column and
> reencrypting it. We could/should add some custom tooling for that,
> but it wouldn't be a routine operation.)
>
> The encryption algorithms are mostly hardcoded right now, but there
> are facilities for picking algorithms and adding new ones that will be
> expanded. The CMK process uses RSA-OAEP. The CEK process uses
> AES-128-CBC right now; a more complete solution should probably
> involve some HMAC thrown in.
>
> In the server, the encrypted datums are stored in types called
> encryptedr and encryptedd (for randomized and deterministic
> encryption). These are essentially cousins of bytea. For the rest of
> the database system below the protocol handling, there is nothing
> special about those. For example, encryptedr has no operators at all,
> encryptedd has only an equality operator. pg_attribute has a new
> column attrealtypid that stores the original type of the data in the
> column. This is only used for providing it to clients, so that
> higher-level clients can convert the decrypted value to their
> appropriate data types in their environments.
>
> Some protocol extensions are required. These should be guarded by
> some _pq_... setting, but this is not done in this patch yet. As
> mentioned above, extra messages are added for sending the CMKs and
> CEKs. In the RowDescription message, I have commandeered the format
> field to add a bit that indicates that the field is encrypted. This
> could be made a separate field, and there should probably be
> additional fields to indicate the algorithm and CEK name, but this was
> easiest for now. The ParameterDescription message is extended to
> contain format fields for each parameter, for the same purpose.
> Again, this could be done differently.
>
> Speaking of parameter descriptions, the trickiest part of this whole
> thing appears to be how to get transparently encrypted data into the
> database (as opposed to reading it out). It is required to use
> protocol-level prepared statements (i.e., extended query) for this.
> The client must first prepare a statement, then describe the statement
> to get parameter metadata, which indicates which parameters are to be
> encrypted and how. So this will require some care by applications
> that want to do this, but, well, they probably should be careful
> anyway. In libpq, the existing APIs make this difficult, because
> there is no way to pass the result of a describe-statement call back
> into execute-statement-with-parameters. I added new functions that do
> this, so you then essentially do
>
> res0 = PQdescribePrepared(conn, "");
> res = PQexecPrepared2(conn, "", 2, values, NULL, NULL, 0, res0);
>
> (The name could obviously be improved.) Other client APIs that have a
> "statement handle" concept could do this more elegantly and probably
> without any API changes.
>
> Another challenge is that the parse analysis must check which
> underlying column a parameter corresponds to. This is similar to
> resorigtbl and resorigcol in the opposite direction. The current
> implementation of this works for the test cases, but I know it has
> some problems, so I'll continue working in this. This functionality
> is in principle available to all prepared-statement variants, not only
> protocol-level. So you can see in the tests that I expanded the
> pg_prepared_statements view to show this information as well, which
> also provides an easy way to test and debug this functionality
> independent of column encryption.
>
> And also, psql doesn't use prepared statements, so writing into
> encrypted columns currently doesn't work at all via psql. (Reading
> works no problem.) All the test code currently uses custom libpq C
> programs. We should think about a way to enable prepared statements
> in psql, perhaps something like
>
> INSERT INTO t1 VALUES ($1, $2) \gg 'val1' 'val2'
>
> (\gexec and \gx are already taken.)
>
> ------
>
> This is not targeting PostgreSQL 15. But I'd appreciate some feedback
> on the direction. As I mentioned above, a lot of the remaining work
> is arguably mostly straightforward. Some closer examination of the
> issues surrounding the libpq API changes and psql would be useful.
> Perhaps there are other projects where that kind of functionality
> would also be useful.

Attachment	Content-Type	Size
v2-0001-Transparent-column-encryption.patch	text/plain	222.3 KB

In response to

Transparent column encryption at 2021-12-03 21:32:21 from Peter Eisentraut

Responses

Re: Transparent column encryption at 2022-07-05 10:54:13 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2022-06-28 23:40:45	Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Previous Message	Andres Freund	2022-06-28 23:27:46	Re: Hardening PostgreSQL via (optional) ban on local file system access