Re: Transparent Data Encryption (TDE) and encrypted files

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent Data Encryption (TDE) and encrypted files
Date: 2019-10-07 15:02:37
Message-ID: 20191007150236.GB4732@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 7, 2019 at 09:44:30AM -0400, Robert Haas wrote:
> On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > We spend a lot of time figuring out exactly how to safely encrypt WAL,
> > heap, index, and pgsql_tmp files. The idea of doing this for another
> > 20 types of files --- to find a safe nonce, to be sure a file rewrite
> > doesn't reuse the nonce, figuring the API, crash recovery, forensics,
> > tool interface --- is something I would like to avoid. I want to avoid
> > it not because I don't like work, but because I am afraid the code
> > impact and fragility will doom the feature.
>
> I'm concerned about that, too, but there's no getting around the fact
> that there are a bunch of types of files and that they do all need to
> be dealt with. If we have a good scheme for doing that, hopefully
> extending it to additional types of files is not that bad, which would
> then spare us the trouble of arguing about each one individually, and
> also be more secure.

Well, do to encryption properly, there is the requirement of the nonce.
If you ever rewrite a bit, you technically have to have a new nonce.
For WAL, since it is append-only, you can use the WAL file name. For
heap/index files, we change the LSN on every rewrite (with
wal_log_hints=on), and we never use the same LSN for writing multiple
relations, so LSN+page-offset is a sufficient nonce.

For clog, it is not append-only, and bytes are rewritten (from zero to
non-zero), so there would have to be a new nonce for every clog file
write to the file system. We can store the nonce in a separate file,
but the clog contents and nonce would have to be always synchronized or
the file could not be properly read. Basically every file we want to
encrypt, needs this kind of study.

> As I also said to Stephen, the people who are discussing this here
> should *really really really* be looking at the Cybertec patch instead
> of trying to invent everything from scratch - unless that patch has,

Someone from Cybertec is on the voice calls we have, and is actively
involved.

> like, typhoid, or something, in which case please let me know so that
> I, too, can avoid looking at it. Even if you wanted to use 0% of the
> code, you could look at the list of file types that they consider
> encrypting and think about whether you agree with the decisions they
> made. I suspect that you would quickly find that you've left some
> things out of your list. In fact, I can think of a couple pretty clear
> examples, like the stats files, which clearly contain user data.

I am asking here because I don't think the Cybertec approach has gotten
enough study compared to what this group can contribute.

> Another reason that you should go look at that patch is because it
> actually tries to grapple with the exact problem that you're worrying
> about in the abstract: there are a LOT of different kinds of files and
> they all need to be handled somehow. Even if you can convince yourself
> that things like pg_clog don't need encryption, which I think is a
> pretty tough sell, there are LOT of file types that directly contain
> user data and do need to be handled. A lot of the code that writes
> those various types of files is pretty ad-hoc. It doesn't necessarily
> do nice things like build up a block of data and then write it out
> together; it may for example write a byte a time. That's not going to
> work well for encryption, I think, so the Cybertec patch changes that

Actually, byte-at-a-time works fine with CTR mode, though that mode is
very sensitive to the reuse of the nonce since the user data is not part
of the input for future encryption blocks.

> stuff around. I personally don't think that the patch does that in a
> way that is sufficiently clean and carefully considered for it to be
> integrated into core, and my plan had been to work on that with the
> patch authors.
>
> However, that plan has been somewhat derailed by the fact that we now
> have hundreds of emails arguing about the design, because I don't want
> to be trying to push water up a hill if everyone else is going in a
> different direction. It looks to me, though, like we haven't really
> gotten beyond the point where that patch already was. The issues of
> nonce and many file types have already been thought about carefully
> there. I rather suspect that they did not get it all right. But, it
> seems to me that it would be a lot more useful to look at the code
> actually written and think about what it gets right and wrong than to
> discuss these points as a strictly theoretical matter.
>
> In other words: maybe I'm wrong here, but it looks to me like we're
> laboriously reinventing the wheel when we could be working on
> improving the working prototype.

The work being done is building on that prototype.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2019-10-07 15:11:40 Re: Non-Active links being referred in our source code
Previous Message Peter Eisentraut 2019-10-07 14:55:58 Re: Revert back to standard AC_STRUCT_TIMEZONE Autoconf macro