From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |
Date: | 2019-05-08 13:32:08 |
Message-ID: | CA+TgmoaWiq+q5VOWPTzgi6sCvDjWbuLMHbCD4Ep_dDoLHZbguw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, May 7, 2019 at 2:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > That better not be true. If you have a design where reading the WAL
> > > lets you get *any* encryption key, you have a bad design, I think.
>
> How does the startup process decrypt WAL during recovery without
> getting any encryption key if we encrypt user data in WAL by multiple
> encryption keys?
The keys have to be supplied from someplace outside of the database
system. I am imagining a command that gets run with the key ID as an
argument and is expected to print the key out on standard output for
the server to read.
I am not an encryption expert, but it's hard for me to imagine this
working any other way. I mean, if you store the keys that you need
for decryption inside the database, isn't that the same as storing
your house key in your house, or your car key in your car? If you
store your car key in the car, then either the car is locked from the
outside, and the key is useless to you, or the car is unlocked from
the outside, and the key is just as available to a thief as it is to
you. Either way, it provides no security. What you do is keep your
car key in your pocket or purse; if you try to start the car, it
"requests" the key from you as proof that you are entitled to start
it. I think the database has to work similarly, except that rather
than protecting the act of "starting" the database, each key is
requested the first time it's needed, when it's discovered that we
need to decrypt some data encrypted with that key.
> > > Well, what threat are you trying to protect against?
>
> Data theft bypassing PostgreSQL's ACL, for example a malicious user
> thefts storage devices and reads datbase files directly.
>
> I'm thinking that only users who have an access privilege of the
> database object can get encryption key for the object. Therefore, when
> a malicious user stole an encryption key by breaking the access
> control system if we suppose data at rest encryption to serve as a yet
> another access control layer we have to use the same encryption key
> for WAL as that we used for database file. But I thought that we
> should rather protect data from that situation by access control
> system and managing encryption keys more robustly.
I don't really follow that logic. If the encryption keys are managed
robustly enough that they cannot be stolen, then we only need one. If
there is still enough risk of key theft that we care to protect
against it, we can't use a dedicated key for the WAL without
increasing the risk.
> > > > FWIW, binary log encryption of MySQL uses different encryption key
> > > > from a key used for table[1]. The key is encrypted by the master key
> > > > for binary log encryption and is stored in each file headers.
> > >
> > > So, if you steal the master key for binary log encryption, you can
> > > decrypt everything, it sounds like.
>
> Yes, I think so.
I am not keen to copy that design. It sounds like having multiple
keys in this design adds a lot of complexity without adding much
security.
> > > Data other than table and index data seems like it is not very
> > > security-sensitive. I'm not sure we need to encrypt it at all. If we
> > > do, using one key seems fine.
>
> Agreed. But it seems not to satisfy some user who require to encrypt
> everything, which we discussed before.
Agreed. I'm thinking possibly we need two different facilities.
Facility #1 could be whole-database encryption: everything is
encrypted with one key on a block level. And facility #2 could be
per-table encryption: blocks for specific tables (and the related
TOAST tables, indexes, and relation forks) are encrypted with specific
keys and, in addition, the WAL records for those tables (and the
related TOAST tables, indexes, and relation forks) are encrypted with
the same key, but on a per-WAL-record level; the original WAL record
would get "wrapped" by a new WAL record that just says "I am an
encrypted WAL record, key ID %d, encrypted contents: %s" and you have
to get the key to decrypt the contents and decrypt the real WAL record
inside of it. Then you process that interior record as normal.
I guess if you had both things, you'd want tables for which facility
#2 was enabled to bypass facility #1, so that no relation data blocks
were doubly-encrypted, to avoid the overhead. But a WAL record would
be doubly-encrypted when both facilities are in use: the record would
get encrypted with the per-table key, and then the blocks it got
stored into would be encrypted with the cluster-wide key.
> I wanted to say that if we encrypt whole database cluster by single
> encryption key we would need to rebuilt the database cluster when
> re-encrypt data. But if we encrypt data in tablespaces by per
> tablespace encryption keys we can re-encrypt data by moving
> tablespaces, without rebuilt it.
Interesting. I suppose that would also be true of per-table keys.
CREATE TABLE newthunk ENCRYPT WITH 'hoge' AS SELECT * FROM thunk; or
something of that sort.
Is there any real advantage of making this per-tablespace rather than
per-table in PostgreSQL's architecture? In some other systems, all the
stuff in a tablespace is glommed together into a big file or a raw
disk partiton or something, so if you used different keys for
different things in the tablespace then it might be hard to know which
key to use for which blocks, but we've got separate files for each
relation anyway. Now, that doesn't answer the question of how
recovery, which can't do pg_class lookups, knows which key to use for
which relation, but recovery can't do pg_tablespace lookups either.
But I think there's a simple answer for that: the encrypted 'wrapper'
WAL record must say which key should be used to decrypt the WAL record
inside of it. And that must be the same key ID that should be used
for the corresponding relation files that the WAL record touches. So
no problem!
I mean, no problem apart from writing a huge amount of very complex code...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-05-08 13:34:38 | Re: accounting for memory used for BufFile during hash joins |
Previous Message | Michael Paquier | 2019-05-08 13:05:38 | Re: Inconsistent error message wording for REINDEX CONCURRENTLY |