Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-07-08 18:39:44
Message-ID: 20190708183944.GB29202@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> On Mon, Jul 8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > > On Mon, Jul 8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > > > When people are asking for multiple keys (not just for key rotation),
> > > > > they are asking to have multiple keys that can be supplied by users only
> > > > > when they need to access the data. Yes, the keys are always in the
> > > > > datbase, but the feature request is that they are only unlocked when the
> > > > > user needs to access the data. Obviously, that will not work for
> > > > > autovacuum when the encryption is at the block level.
> > > >
> > > > > If the key is always unlocked, there is questionable security value of
> > > > > having multiple keys, beyond key rotation.
> > > >
> > > > That is not true. Having multiple keys also allows you to reduce the
> > > > amount of data encrypted with a single key, which is desirable because:
> > > >
> > > > 1. It makes cryptanalysis more difficult
> > > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> > >
> > > What systems use multiple keys like that? I know of no website that
> > > does that. Your arguments seem hypothetical. What is your goal here?
> >
> > Not sure what the reference to 'website' is here, but one doesn't get
> > certificates for TLS/SSL usage that aren't time-bounded, and when it
> > comes to the actual on-the-wire encryption that's used, that's a
> > symmetric key that's generated on-the-fly for every connection.
> >
> > Wouldn't the fact that they generate a different key for every
> > connection be a pretty clear indication that it's a good idea to use
> > multiple keys and not use the same key over and over..?
> >
> > Of course, we can discuss if what websites do with over-the-wire
> > encryption is sensible to compare to what we want to do in PG for
> > data-at-rest, but then we shouldn't be talking about what websites do,
> > it'd make more sense to look at other data-at-rest encryption systems
> > and consider what they're doing.
>
> (I talked to Joe on chat for clarity.) In modern TLS, the certificate is
> used only for authentication, and Diffie–Hellman is used for key
> exchange:
>
> https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

Right, and the key that's figured out for each connection is at least
specific to the server AND client keys/certificates, thus meaning that
they're changed at least as frequently as those change (and clients end
up creating ones on the fly randomly if they don't have one, iirc).

> So, the question is whether you can pass so much data in TLS that using
> the same key for the entire session is a security issue. TLS originally
> had key renegotiation, but that was removed in TLS 1.3:
>
> https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
> To mitigate these types of attacks, TLS 1.3 disallows renegotiation.

It was removed due to attacks targeting the renegotiation, not because
doing re-keying by itself was a bad idea, or because using multiple keys
was seen as a bad idea.

> Of course, a database is going to process even more data so if the
> amount of data encrypted is a problem, we might have a problem too in
> using a single key. This is not related to whether we use one key for
> the entire cluster or multiple keys per tablespace --- the problem is
> the same. I guess we could create 1024 keys and use the bottom bits of
> the block number to decide what key to use. However, that still only
> pushes the goalposts farther.

All of this is about pushing the goalposts farther away, as I see it.
There's going to be trade-offs here and there isn't going to be any "one
right answer" when it comes to this space. That's why I'm inclined to
argue that we should try to come up with a relatively *good* solution
that doesn't create a huge amount of work for us, and then build on
that. To that end, leveraging metadata that we already have outside of
the catalogs (databases, tablespaces, potentially other information that
we store, essentially, in the filesystem metadata already) to decide on
what key to use, and how many we can support, strikes me as a good
initial target.

> Anyway, I will to research the reasonable data size that can be secured
> with a single key via AES. I will look at how PGP encrypts large files
> too.

This seems unlikely to lead to a definitive result, but it would be
interesting to hear if there have been studies around that and what
their conclusions were.

When it comes to concerns about autovacuum or other system processes,
those don't have any direct user connections or interactions, so having
them be more privileged and having access to more is reasonable.

Ideally, all of this would leverage a vaulting system or other mechanism
which manages access to the keys and allows their usage to be limited.
That's been generally accepted as a good way to bridge the gap between
having to ask users every time for a key and having keys stored
long-term in memory. Having *only* the keys for the data which the
currently connected user is allowed to access would certainly be a great
initial capability, even if system processes (including potentially WAL
replay) have to have access to all of the keys. And yes, shared buffers
being unencrypted and accessible by every backend continues to be an
issue- it'd be great to improve on that situation too. I don't think
having everything encrypted in shared buffers is likely the solution,
rather, segregating it up might make more sense, again, along similar
lines to keys and using metadata that's outside of the catalogs, which
has been discussed previously, though I don't think anyone's actively
working on it.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-07-08 18:44:09 Re: progress report for ANALYZE
Previous Message Tomas Vondra 2019-07-08 18:39:08 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)