From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Sehrope Sarkuni <sehrope(at)jackdb(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |
Date: | 2019-08-16 13:30:58 |
Message-ID: | 20190816133058.GA8400@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 15, 2019 at 09:01:05PM -0400, Stephen Frost wrote:
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > I assume you are talking about my option #1. I can see if you only need
> > a few tables encrypted, e.g., credit card numbers, it can be excessive
> > to encrypt the entire cluster. (I think you would need to encrypt
> > pg_statistic too.)
>
> Or we would need a seperate encrypted pg_statistic, or a way to encrypt
> certain entries inside pg_statistic.
Yes.
> > The tricky part will be WAL --- if we encrypt all of WAL, the per-table
> > overhead might be minimal compared to the WAL encryption overhead. The
> > better solution would be to add a flag to WAL records to indicate
> > encrypted entries, but you would then leak when an encryption change
> > happens and WAL record length. (FYI, numeric values have different
> > lengths, as do character strings.) I assume we would still use a single
> > key for all tables/indexes, and one for WAL, plus key rotation
> > requirements.
>
> I don't think the fact that a change was done to an encrypted blob is an
> actual 'leak'- anyone can tell that by looking at the at the encrypted
> data before and after. Further, the actual change would be encrypted,
> right? Length of data is necessary to include in the vast majority of
> cases that the data is being dealt with and so I'm not sure that it
> makes sense for us to be worrying about that as a leak, unless you have
> a specific recommendation from a well known source discussing that
> concern..?
Yes, it is a minor negative, but we would need to see some performance
reason to have that minor negative, and I have already stated why I
think there might be no performance reason to do so. Masahiko Sawada
talk at PGCon 2019 supports that conclusion:
https://www.youtube.com/watch?v=TXKoo2SNMzk
> > I personally would like to see full cluster implemented first to find
> > out exactly what the overhead is. As I stated earlier, the overhead of
> > determining which things to encrypt, both in code complexity, user
> > interface, and processing overhead, might not be worth it.
>
> I disagree with this and feel that the overhead that's being discussed
> here (user interface, figuring out if we should encrypt it or not,
> processing overhead for those determinations) is along the lines of
> UNLOGGED tables, yet there wasn't any question about if that was a valid
> or useful feature to implement. The biggest challenge here is really
We implemented UNLOGGED tables because where was a clear performance win
to doing so. I have not seen any measurements for encryption,
particularly when WAL is considered.
> around key management and I agree that's difficult but it's also really
> important and something that we need to be thinking about- and thinking
> about how to work with multiple keys and not just one. Building in an
> assumption that we will only ever work with one key would make this
> capability nothing more than DBA-managed filesystem-level encryption
Agreed, that's all it is.
> (though even there different tablespaces could have different keys...)
> and I worry would make later work to support multiple keys more
> difficult and less likely to actually happen. It's also not clear to me
> why we aren't building in *some* mechanism to work with multiple keys
> from the start as part of the initial design.
Well, every time I look at multiple keys, I go over exactly what that
means and how it behaves, but get no feedback on how to address the
problems.
> > I can see why you would think that encrypting less would be easier than
> > encrypting more, but security boundaries are hard to construct, and
> > anything that requires a user API, even more so.
>
> I'm not sure I'm follwing here- I'm pretty sure everyone understands
> that selective encryption will require more work to implement, in part
> because an API needs to be put in place and we need to deal with
> multiple keys, etc. I don't think anyone thinks that'll be "easier".
Uh, I thought Masahiko Sawada stated but, but looking back, I don't see
it, so I must be wrong.
> > > > > At least it should be clear how [2] will retrieve the master key because [1]
> > > > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > > > mentioned in [3] seems viable, although I think [1] uses approach which is
> > > > > more convenient if the passphrase should be read from console.)
> > >
> > > I think that we can also provide a way to pass encryption key directly
> > > to postmaster rather than using passphrase. Since it's common that
> > > user stores keys in KMS it's useful if we can do that.
> >
> > Why would it not be simpler to have the cluster_passphrase_command run
> > whatever command-line program it wants? If you don't want to use a
> > shell command, create an executable and call that.
>
> Having direct integration with a KMS would certainly be valuable, and I
> don't see a reason to deny users that option if someone would like to
> spend time implementing it- in addition to a simpler mechanism such as a
> passphrase command, which I believe is what was being suggested here.
OK, I am just trying to see why we would not use the
cluster_passphrase_command-like interface to do that.
> > > > > Rotation of
> > > > > the master key is another thing that both versions of the feature should do in
> > > > > the same way. And of course, the fronend applications need consistent approach
> > > > > too.
> > > >
> > > > I don't see the value of an external library for key storage.
> > >
> > > I think that big benefit is that PostgreSQL can seamlessly work with
> > > external services such as KMS. For instance, when key rotation,
> > > PostgreSQL can register new key to KMS and use it, and it can remove
> > > keys when it no longer necessary. That is, it can enable PostgreSQL to
> > > not only not only getting key from KMS but also registering and
> > > removing keys. And we also can decrypt MDEK in KMS instead of doing in
> > > PostgreSQL which is more safety. In addition, once someone create the
> > > plugin library of an external services individual projects don't need
> > > to create that.
> >
> > I think the big win for an external library is when you don't want the
> > overhead of calling an external program. For example, we certainly
> > would not want to call an external program while processing a query. Do
> > we have any such requirements for encryption, especially since we only
> > are going to allow offline mode for encryption mode changes and key
> > rotation in the first version?
>
> The strong push for a stripped-down and "first version" that is
> extremely limited is really grating on me as it seems we have quite a
Well, "grating" doesn't change any facts. If you want to change that,
you will need to do as I stated earlier:
https://www.postgresql.org/message-id/20190810021716.ovpqenqjb3b7uokc@momjian.us
> few people who are interested in making progress here and a small number
> of others who are pushing back and putting up limitations that "the
> first version can't have X" or "the first version can't have Y".
>
> I'm all for incremental development, but we need to be thinking about
> the larger picture when we develop features and make sure that we don't
> bake in assumptions that will later become very difficult for us to work
> ourselves out of (especially when it comes to user interface and things
> like GUCs...), but where we decide to draw a line shouldn't be based on
> assumptions about what's going to be difficult and what isn't- let's let
> those who want to work on this capability work on it and as we see the
> progress, if there's issues which come up with a specific area that seem
> likely to prove difficult to include, then we can consider backing away
> from that while keeping it in mind while doing further development.
I have seen no one present a clear description of how anything beyond
all-cluster encryption would work or be secure. Wishing that were not
the case doesn't change things.
> In other words, I feel like we're getting trapped here in a
> "requirements definition" phase of a traditional waterfall-style
> development cycle we have to decide, up front, the EXACT set of features
> and capabilities that we want and then we are going to expect people to
> develop according to EXACTLY that set, and we'll shoot down anything
> that comes across which is trying to do more or is trying to be more
> flexible in anticipation of capabilities that we know we will want down
> the road. It's likely clear already but I'll say it anyway- I don't
> think it's a good idea to go down that route.
I will continue to shoot down whatever I think has no reasonable chance
of working. I can just let it go and watch it fail, but I don't see
that as a good approach.
I will state whet I have already told some people privately, that for
this feature, we have many people understanding 40% of the problem, but
thinking they understand 90%. I do agree we should plan for our
eventual full feature set, but I can't figure out what that feature set
looks like beyond full-cluster encryption, and no one is addressing my
concerns to move that forward. Vague complains that they don't like the
process are not changing that.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2019-08-16 13:31:55 | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |
Previous Message | Etsuro Fujita | 2019-08-16 13:25:10 | Re: [HACKERS] advanced partition matching algorithm for partition-wise join |