From: | "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |
Date: | 2019-08-06 22:13:30 |
Message-ID: | 00edeaf1-74e7-90ab-c24d-eb5ee2be857c@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 8/6/19 3:01 PM, Bruce Momjian wrote:
> On Tue, Aug 6, 2019 at 01:55:38PM -0400, Bruce Momjian wrote:
>> CTR mode creates a bit stream for the first 16 bytes with nonce of
>> (segment_number, counter = 0), and the next 16 bytes with
>> (segment_number, counter = 1), etc. We only XOR using the parts of the
>> bit stream we want to use. We don't care what the WAL content is --- we
>> just XOR it with the stream with the matching counter for that part of
>> the WAL.
>
> The diagram which is part of this section might be helpful:
>
> https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)
> https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#/media/File:CTR_encryption_2.svg
This is going to be a slightly long (understatement) email that I
thought would be easier to try to communicate all in one place vs.
replying to individual parts on this long thread. My main goal was to
present some things I had researched on TDE, some of which had been
mentioned on thread, and compile it in one place (it's also why I was
slow to respond on some other things on the thread -- sorry!)
While compiling this note, one idea that came to mind is that given the
complexity of this topic and getting the key pieces (no pun intended)
correct, it may be worthwhile if the people interested in working on TDE
make some time where we can all either get together and/or set up a
group call where we hash out the architecture we want to build to ensure
we have a fundamentally sound implementation (even if it takes a few
versions to implement it fully).
Since my last note, I really dove in to understand what other RDBMS
system are doing and what we can learn from them as well as what would
make sense in PostgreSQL. In particular, one goal is to be able to build
a TDE system that satisfies keeping data both confidential and
integrable while at rest while minimizing the amount of overhead we
introduce into the system.
Additionally, I would strongly suggest, if not require, that what we
build follows guidelines such as those outlined by NIST, as failure to
do so could end up that some teams would be unable to utilize our TDE
solution. And of course, mitigate the risk that we introduce security
vulnerabilities :)
(I've also continued to build out my terrible prototype to experiment
with some of the methods suggested. It's written in Python and leverages
the "cryptography"[0] library [which it itself has some good
recommendations on how to use its various parts) and still not worth
sharing yet (though happy to share if asked off-list -- you will be
underwhelmed].)
Below I outline some of my findings from looking at other systems,
looking at our own code, and make recommendations to the best of my
abilities on this matter. I broke it up into these 3 sections, which are
interspersed with research and recommendations:
1. Encryption Key Management
2. Encryption/Decryption of Pages + WAL Records
3. Automated Key Rotation
Of course, they are tightly intertwined, but thought it would be easier
to look at it in this way. It does stop short of certain implementation
details.
Anyway, without further ado:
#1 Encryption Key Management
----------------------------
While I thought what we had proposed on list (KEK, MDEK, TDEK, WDEK)
system made a lot of sense, I still decided to look at what other
systems did. In particular, I looked at SQL Server and the "redwood
city" database.
It turns out the SQL Server has a somewhat similar architecture to what
we propose[1]:
- A Service Master Key (equivalent to the key-encrypting key KEK) is
created when SQL Server is setup (equivalent to initdb)
- SQL Server uses Windows facilities to protect the master cluster key.
We are trying to achieve this with our KEK, which are saying we would
store in the GUC `tde_passphrase` (or equivalent, I forget the name we
proposed, if we proposed one. Sorry.). If we wanted to protect it at the
OS level, Peter Eisentraut shows a way to do this equivalent with the
"ssl_passphrase" GUC and systemd[2]
- SQL Server lets you create a "master key" for all the clusters, explicitly
- Here is where the differences start: SQL Server then lets you create a
certificate that can be protected by the master key.
- For the specific databases in the SQL Server cluster, you can then
create a "database encryption key" (they call it "DEK") that is used to
encrypt the data, and ask that said key is protected by a certificate.
(For SQL Server, it appears that TDE is all or nothing within a
database, but *not* within the entire cluster.)
SQL Server does support a multi-key approach which allows for, amongst
other things, key rotation. For what we are looking to do, it may be too
much, but it seems similar to the ideas proposed.
The redwood city database also has something similar to what we are
proposing[3]:
Basically:
- A master key (equiv to our MDEK) is managed by an "external
security module".
- Each tablespace has a key that can be encrypted/decrypted by the
MDEK. This would be similar to our TDEK.
The keys are secured in a wallet[8].
This is a long way of saying that we are in the right direction
architecture-wise.
Here are some thoughts on that. Using terms defined in Joe's email[4].
Note I am assuming we are going with AES-256 (and a bit more on this later).
0. First, proposing the KEK he defined is the MDEK. More on that in
a second.
1. To have a MDEK stored in a GUC, we'd have to have people
comfortable storing with random bytes in a GUC. It may be better to either:
a) Allow for the GUC to point to a file that has the MDEK stored
b) Allow the GUC to be a passphrase that can unlock the MDEK,
which is stored internally in the catalog.
For a), we might want to provide some facilities for the user to
generate a MDEK, which there have been suggestions to do that from initdb.
2. We need to ensure we allow the user to cleanly rotate the MDEK.
Just updating the passphrase would be no good, as they'd basically lock
themselves out of all of their data :) This was proposed on the list,
but would need to ensure we can handle this in a tidy way with what we
can control in the configuration.
3. We then have a "database encryption key" (DDEK) which is
presumably generated on CREATE DATABASE (as I am not sure I've seen that
explicitly stated anywhere). This is encrypted by the MDEK using a
padded key-wrapping function[5].
4. Each relation is encrypted by a "table encryption key" (TDEK)
that is generated by a HMAC-based key deriviation function (HKDF) using
SHA-256 and the DDEK. Guidelines for how to safely generate this are
present in NIST 800-56C[6]. We would want a two-step key derivation,
which would include[6]:
a) A cryptographically-secure randomly generated salt of 64
bytes[6] (Section 5; Table 4) that is created when the relation is created.
b) Some "fixed info" bit string which is likely to be composed
of the relation OID (4 bytes) and a "key identifier" (which I have
pegged at 1 bytes, but more on this later)
5. Last but not least, the "WAL encrpytion key" (WDEK). I have
looked at this one the least as I was focused on the pages, but this is
also one where we can use a HKDF. However, I believe the difference is
we need to use the **MDEK** here to generate the WDEK, which would include:
a) A cryptographically-secure randomly generated salt of 64
bytes[6] (Section 5; Table 4) that is created when the **cluster** is
initialized.
b) A counter bit string, that is incremented each time we've
encrypted more than 64GB of WAL. (More in key management)
One open question is do we allow users to explicitly add their own DDEK
or TDEK? I would say yes for DDEK (given it would be stored anyway), no
for TDEK for now, if only because of the potential desire to
automatically rotate the TDEK once 64GB of data has been encrypted by
the same TDEK (more on this in a bit). It seems, if we follow the SQL
Server example, the level of granularity it goes down to is the
database, so we may be safe there. Redwood city lets you select the
tablespace to use TDE, so we can be mindful of that as well.
I well understand the nuances are in the details, but this should build
on good practices started by other RDBMS, guidelines recommended by
NIST, and provide something that is relatively easy for our users to
implement.
#2 Encryption/Decryption of Pages + WAL Records
-----------------------------------------------
To me there are two goals with this part:
1. Ensure confidentiality of the data, in particular, if the storage
device is removed
2. Ensure the integrity of the data should someone tamper with the
ciphertext
The other thing to consider (per this discussion) is doing so in a way
where we can add minimal overhead to PostgreSQL, both in terms of
additional storage. (Though if you want TDE you are already accepting
some overhead.)
With that all said, I first wanted to understand what other RDBMS did
(surprise) for their encryption algorithms and modes.
I had a bit of trouble trying to figure out what SQL Server did, but if
this[7] is still accurate, it's a combination of AES-{128,192,256} and
the CBC mode. They provide an integrity check with a SHA1 hash.
With redwood city, they appear to support a few algorithms and modes[9]
and I did not narrow down which is used by default[10]. They list out
CBC, CFB, ECB, and OFB, and also indicate they use a SHA1 hash to
provide the integrity check.
This lead me to an exploration of all the different modes we could
utilize and what would make the most sense for PostgreSQL. Depending on
what you read, you could easily get your head (or your drives)
spinning[11][12][13][14].
However, while reading all of these, a few themes emerged:
1. Unless you're using authentication encryption[15], your data is
subject to tampering (This has been brought up on the list).
2. All of modes have tradeoffs given they are focused on
performance of encrypting/decrypting
3. If you're using anything with counters, you MUST choose a secure
counting method and NEVER repeat the counter with the same key. This has
also been brought up on the list.
(It goes without saying, but we also want to ensure we use something
that is in the clear on intellectual property, which you could run into
with some modes)
No matter which mode we choose, we will have to store an integrity
check, and the exploration I began was how we could do this while taking
up the least amount of space. But we _cannot_ compromise on the
integrity check, otherwise any encrypted data is subject to tampering. I
know I read this in one of the guidelines somewhere, but I can't seem to
find it at the moment I'm typing this.
I believe the mode we are looking for is GCM[16] which has similar
benefits to CTR (parallelism) but allows for an integrity check (via its
GMAC). And if you follow NIST 800-38d[17] (below for convenience), you
can actually create IVs deterministically! It also has the added benefit
where you can associate additional, unencrypted data as part of the
decryption process (a nice example of how to do so in the Python
cryptography library[18]).
Here are the guidelines:
https://csrc.nist.gov/publications/detail/sp/800-38d/final
A lot of what we are concerned about is in Section 8. For IV construction:
- The size they recommend is 96 bits. If the IV is 96 bits or less,
there is guidance for deterministic construction (Sec 8.2.1)
- An IV can have a "fixed" field and an "invocation" field. The
"fixed" field could be something like the relation OID (32-bits). The
invocation field could be the page number (32-bits). 64-bit IVs are okay
-- they recommend 96 bits.
^ The beauty of this is if we decide we are ok with 64-bit IVs, we
do not need to store any additional data :) If we want 96 bits we'd like
need to store a 4 byte "salt" if you will. Or we could leave it blank in
case we ever have more than 2**32 pages (*hides*).
- You cannot repeat an IV for a given key.
- Part of the output of the encryption is a tag, which can be
anywhere from 32 - 128 bits in length. The recommendation is to use 128
bits. This would then be the only additional storage we need. (If
smaller tags are to be considered, there are limits to the number of
invocations of the decryption function).
This means at most we only need to add 8 additional bytes of storage on
the pages + WAL headers.
However, even if we were to use a 96-bit IVs with 64-bits variable
space, we would need to eventually rotate keys (esp. WAL). As such:
# 3. Automated Key Rotation
---------------------------
If need to rotate an encryption key once we are at the point where a
counter will reset, that means we also need to know which key encrypted
the function. If we decide that 64GB is our magic number, based on some
back of the envelope calculation with an assist from a nice blog post by
Simon[19], I determined that each relation would require a maximum of
512 keys (32TB / 64GB). Using the HKDF method of deriving keys, we can
just use a byte to store which "key" should be used to encrypt/decrypt a
particular page. I believe would be similar for WAL.
BUT...AIUI that 64GB limit is for a key/IV pair. If each page has its
own IV and our counter is tied to the total number of pages available in
a relation, do we need to rotate that key? (I apologize at this point,
I've been writing this note for a couple of hours and my brain is
getting a bit mushy, so I'm fine with being told I'm way off base).
WDEK is a different, as we know the counter can wrap. As such, we likely
need to keep some length of a "key id" (4 bytes?) on the WAL to know
which WDEK was used. The key id would be passed in as part of the
"counter bit string" to the HKDF. We want to make it large enough that
the probability of wrapping around is low, and we are within the
guidelines for using a HKDF properly.
---------------
At this point I'm running out of steam as I type this. I realize overall
there is a lot to consider.
I'll also make the suggestion again that perhaps the people who are
interested in working on TDE have a discussion/meetup etc. to iron out
details.
If we can build a system that's architecturally sound (read: no one can
file a CVE on the architecture), adheres to guidelines for teams that
opt to utilize things like TDE, stores data confidentially and with
integrity (where we can detect tampering), and is easy(-ish) to use,
we'll be in a good position :)
Thanks,
Jonathan
[0] https://cryptography.io/en/latest/
[1]
https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-2017
[2]
https://www.2ndquadrant.com/en/blog/postgresql-passphrase-protected-ssl-keys-systemd/
[3]
https://docs.oracle.com/database/121/ASOAG/introduction-to-transparent-data-encryption.htm#ASOAG10139
[4]
https://www.postgresql.org/message-id/c878de71-a0c3-96b2-3e11-9ac2c35357c3%40joeconway.com
[5] https://tools.ietf.org/html/rfc5649.html
[6] https://csrc.nist.gov/publications/detail/sp/800-56c/rev-1/final
[7]
https://blogs.msdn.microsoft.com/sqlsecurity/2009/03/30/sql-server-encryptbykey-cryptographic-message-description/
[8]
https://docs.oracle.com/cd/B19306_01/network.102/b14268/asotrans.htm#BABGHIDE
[9]
https://docs.oracle.com/database/121/DBSEG/data_encryption.htm#DBSEG80084
[10]
https://docs.oracle.com/cd/B19306_01/network.102/b14268/asotrans.htm#BABHJCHD
[11]
https://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html
[12]
https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html#encryption-modes-and-usage
[13] https://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/
[14]
https://crypto.stackexchange.com/questions/14628/why-do-we-use-xts-over-ctr-for-disk-encryption
[15] https://en.wikipedia.org/wiki/Authenticated_encryption
[16] https://en.wikipedia.org/wiki/Galois/Counter_Mode
[17] https://csrc.nist.gov/publications/detail/sp/800-38d/final
[18]
https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/#cryptography.hazmat.primitives.ciphers.modes.GCM
[19] https://www.2ndquadrant.com/en/blog/postgresql-maximum-table-size/
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-08-06 22:17:59 | Re: Problem with default partition pruning |
Previous Message | Stephen Frost | 2019-08-06 21:43:55 | Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions |