Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Sehrope Sarkuni <sehrope(at)jackdb(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-08-06 01:02:34
Message-ID: 20190806010234.x57h4orvxhtfe5bo@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 31, 2019 at 09:25:01AM -0400, Sehrope Sarkuni wrote:
> On Tue, Jul 30, 2019 at 4:48 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> I had more time to think about the complexity of adding relfilenode to
> the IV.  Since relfilenode is only unique within a database/tablespace,
> we would need to have pg_upgrade preserve database/tablespace oids
> (which I assume are the same as the directory and tablespace symlinks).
> Then, to decode a page, you would need to look up those values.  This is
> in addition to the new complexity of CREATE DATABASE and moving files
> between tablespaces.  I am also concerned that crash recovery operations
> and cluster forensics and repair would need to also deal with this.
>
> I am not even clear if pg_upgrade preserving relfilenode is possible ---
> when we wrap the relfilenode counter, does it start at 1 or at the
> first-user-relation-oid?  If the former, it could conflict with oids
> assigned to new system tables in later major releases.  Tying the
> preservation of relations to two restrictions seems risky.
>
>
> Agreed. Unless you know for sure the input is going to immutable across copies
> or upgrades, including anything in either the IV or key derivation gets risky
> and could tie you down for the future. That's partly why I like the idea
> separate salt (basically you directly pay for the complexity by tracking that).

Yes, fragility is something to be concerned about. The system is
already very complex, and we occasionally have to do forensic work or
repairs.

> Even if we do not include a separate per-relation salt or things like
> relfilenode when generating a derived key, we can still include other types of
> immutable attributes. For example the fork type could be included to eventually
> allow multiple forks for the same relation to be encrypted with the same IV =
> LSN + Page Number as the derived key per-fork would be distinct.

Yes, the fork number could be useful in this case. I was thinking we
would just leave the extra bits as zeros and we can then set it to '1'
or something else for a different fork.

>
> Using just the page LSN and page number allows a page to be be
> decrypted/encrypted independently of its file name, tablespace, and
> database, and I think that is a win for simplicity.  Of course, if it is
> insecure we will not do it.
>
>
> As LSN + Page Number combo is unique for all relations (not just one relation)
> I think we're good for pages.

Yes, since a single LSN can only be used for a single relation, and I
added an Assert to check that. Good.

> I am thinking for the heap/index IV, it would be:
>
>         uint64 lsn;
>         unint32 page number;
>         /* only uses 11 bits for a zero-based CTR counter for 32k pages */
>         uint32 counter;
>
>
> Looks good. 
>  
>
> and for WAL it would be:
>
>         uint64 segment_number;
>         uint32    counter;
>         /* guarantees this IV doesn't match any relation IV */
>         uint32   2^32-1 /* all 1's */   
>
>
> I need to read up more on the structure of the WAL records but here's some high
> level thoughts:
>
> WAL encryption should not use the same key as page encryption so there's no
> need to design the IV to try to avoid matching the page IVs. Even a basic
> derivation with a single fixed WDEK = HKDF(MDEK, "WAL") and TDEK = HKDF(MDEK,
> "PAGE") would ensure separate keys. That's the the literal string "WAL" or
> "PAGE" being added as a salt to generate the respective keys, all that matters
> is they're different.

I was thinking the WAL would use the same key since the nonce is unique
between the two. What value is there in using a different key?

> Ideally WAL encryption would generating new derived keys as part of the WAL
> stream. The WAL stream is not fixed so you have the luxury of being able to add
> a "Use new random salt XZY going forward" records. Forcing generation of a new
> salt/key upon promotion of a replica would ensure that at least the WAL is
> unique going forward. Could also generate a new upon server startup, after

Ah, yes, good point, and using a derived key would make that easier.
The tricky part is what to use to create the new derived key, unless we
generate a random number and store that somewhere in the data directory,
but that might lead to fragility, so I am worried. We have pg_rewind,
which allows to make the WAL go backwards. What is the value in doing
this?

> every N bytes, or a new one for each new WAL file. There's much more
> flexibility compared to page encryption.
>
> As WAL is a single continuous stream, we can start the IV for each derived WAL
> key from zero. There's no need to complicate it further as Key + IV will never
> be reused.

Uh, you want a new random key for each WAL file? I was going to use the
WAL segment number as the nonce, which is always increasing, and easily
determined. The file is 16MB.

> If WAL is always written as full pages we need to ensure that the empty parts
> of the page are actual zeros and not "encrypted zeroes". Otherwise an XOR of
> the empty section of the first write of a page against a subsequent one would
> give you the plain text.

Right, I think we need the segment number as part of the nonce for WAL.

> The non-fixed size of the WAL allows for the addition of a MAC though I'm not
> sure yet the best way to incorporate it. It could be part of each encrypted
> record or its own summary record (providing a MAC for a series of WAL records).
> After I've gone through this a bit more I'm looking to put together a write up
> with this and some other thoughts in one place.

I don't think we want to add a MAC at this point since the MAC for 8k
pages seems unattainable.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-08-06 01:28:18 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Previous Message Stephen Frost 2019-08-06 00:52:26 Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions