From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Jacob Champion <pchampion(at)vmware(dot)com>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Transparent column encryption |
Date: | 2021-12-08 01:58:06 |
Message-ID: | 3c48f839-45f0-b85d-fd99-8693171f515e@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/8/21 00:26, Jacob Champion wrote:
> On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote:
>> IMO it's impossible to solve this attack within TCE, because it requires
>> ensuring consistency at the row level, but TCE obviously works at column
>> level only.
>
> I was under the impression that clients already had to be modified to
> figure out how to encrypt the data? If part of that process ends up
> including enforcement of encryption for a specific column set, then the
> addition of AEAD data could hypothetically be part of that hand-
> waviness.
>
I think "transparency" here means the client just uses the regular
prepared-statement API without having to explicitly encrypt/decrypt any
data. The problem is we can't easily tie this to other columns in the
table, because the client may not even know what values are in those
columns.
Imagine you do this
UPDATE t SET encrypted_column = $1 WHERE another_column = $2;
but you want to ensure the encrypted value belongs to a particular row
(which may or may not be identified by the another_column value). How
would the client do that? Should it fetch the value or what?
Similarly, what if the client just does
SELECT encrypted_column FROM t;
How would it verify the values belong to the row, without having all the
data for the row (or just the required columns)?
> Unless "transparent" means that the client completely defers to the
> server on whether to encrypt or not, and silently goes along with it if
> the server tells it not to encrypt?
I think that's probably a valid concern - a "bad DBA" could alter the
table definition to not contain the "ENCRYPTED" bits, and then peek at
the plaintext values.
But it's not clear to me how exactly would the AEAD prevent this?
Wouldn't that be also specified on the server, somehow? In which case
the DBA could just tweak that too, no?
In other words, this issue seems mostly orthogonal to the AEAD, and the
right solution would be to allow the client to define which columns have
to be encrypted (in which case altering the server definition would not
be enough).
> That would only protect against a
> _completely_ passive DBA, like someone reading unencrypted backups,
> etc. And that still has a lot of value, certainly. But it seems like
> this prototype is very close to a system where the client can reliably
> secure data even if the server isn't trustworthy, if that's a use case
> you're interested in.
>
Right. IMHO the "passive attacker" is a perfectly fine model for use
cases that would be fine with e.g. pgcrypto if there was no risk of
leaking plaintext values to logs, system catalogs, etc.
If we can improve it to provide (at least some) protection against
active attackers, that'd be a nice bonus.
>> I believe TCE can do AEAD at the column level, which protects against
>> attacks that flipping bits, and similar attacks. It's just a matter of
>> how the client encrypts the data.
>
> Right, I think authenticated encryption ciphers (without AD) will be
> important to support in practice. I think users are going to want
> *some* protection against active attacks.
>
>> Extending it to protect the whole row seems tricky, because the client
>> may not even know the other columns, and it's not clear to me how it'd
>> deal with things like updates of the other columns, hint bits, dropped
>> columns, etc.
>
> Covering the entire row automatically probably isn't super helpful in
> practice. As you mention later:
>
>> It's probably possible to get something like this (row-level AEAD) by
>> encrypting enriched data, i.e. not just the card number, but {user ID,
>> card number} or something like that, and verify that in the webapp. The
>> problem of course is that the "user ID" is just another column in the
>> table, and there's nothing preventing the DBA from modifying that too.
>
> Right. That's why the client has to be able to choose AD according to
> the application. In my previous example, the victim's email address can
> be copied by the DBA, but they wouldn't be able to authenticate as that
> user and couldn't convince the client to use the plaintext on their
> behalf.
>
Well, yeah. But I'm not sure how to make that work easily, because the
client may not have the data :-(
I was thinking about using a composite data type combining the data with
the extra bits - that'd not be all that transparent as it'd require the
client to build this manually and then also cross-check it after loading
the data. So the user would be responsible for having all the data.
But doing that automatically/transparently seems hard, because how would
you deal e.g. with SELECT queries reading data through a view or CTE?
How would you declare this, either at the client or server?
Do any other databases have this capability? How do they do it?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-12-08 02:04:23 | Re: Is there a way (except from server logs) to know the kind of on-going/last checkpoint? |
Previous Message | Bharath Rupireddy | 2021-12-08 01:54:41 | Re: Is there a way (except from server logs) to know the kind of on-going/last checkpoint? |