Re: Transparent column encryption

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Jacob Champion <pchampion(at)vmware(dot)com>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent column encryption
Date: 2021-12-09 10:04:26
Message-ID: 163b3295-7046-a12f-055e-55b68c594fb9@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/9/21 01:12, Jacob Champion wrote:
> On Wed, 2021-12-08 at 02:58 +0100, Tomas Vondra wrote:
>>
>> On 12/8/21 00:26, Jacob Champion wrote:
>>> On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote:
>>>> IMO it's impossible to solve this attack within TCE, because it requires
>>>> ensuring consistency at the row level, but TCE obviously works at column
>>>> level only.
>>>
>>> I was under the impression that clients already had to be modified to
>>> figure out how to encrypt the data? If part of that process ends up
>>> including enforcement of encryption for a specific column set, then the
>>> addition of AEAD data could hypothetically be part of that hand-
>>> waviness.
>>
>> I think "transparency" here means the client just uses the regular
>> prepared-statement API without having to explicitly encrypt/decrypt any
>> data. The problem is we can't easily tie this to other columns in the
>> table, because the client may not even know what values are in those
>> columns.
>
> The way I originally described my request -- "I'd like to be able to
> tie an encrypted value to other column (or external) data" -- was not
> very clear.
>
> With my proposed model -- where the DBA (and the server) are completely
> untrusted, and the DBA needs to be prevented from using the encrypted
> value -- I don't think there's a useful way for the client to use
> associated data that comes from the server. The client has to know what
> the AD should be beforehand, because otherwise the DBA can make it so
> the server returns whatever is correct.
>

True. With untrusted server the additional data would have to come from
some other source. Say, an isolated auth system or so.

>> Imagine you do this
>>
>> UPDATE t SET encrypted_column = $1 WHERE another_column = $2;
>>
>> but you want to ensure the encrypted value belongs to a particular row
>> (which may or may not be identified by the another_column value). How
>> would the client do that? Should it fetch the value or what?
>>
>> Similarly, what if the client just does
>>
>> SELECT encrypted_column FROM t;
>>
>> How would it verify the values belong to the row, without having all the
>> data for the row (or just the required columns)?
>
> So with my (hopefully more clear) model above, it wouldn't. The client
> would already have the AD, and somehow tell libpq what that data was
> for the query.
>
> The rabbit hole I led you down is one where we use the rest of the row
> as AD, to try to freeze pieces of it in place. That might(?) have some
> useful security properties (if the client defines its use and doesn't
> defer to the server). But it's not what I intended to propose and I'd
> have to think about that case some more.
>

OK

> In my credit card example, I'm imagining something like (forgive the
> contrived syntax):
>
> SELECT address, :{aead(users.credit_card, 'user(at)example(dot)com')}
> FROM users WHERE email = 'user(at)example(dot)com';
>
> UPDATE users
> SET :{aead(users.credit_card, 'user(at)example(dot)com')} = '1234-...'
> WHERE email = 'user(at)example(dot)com';
>
> The client explicitly links a table's column to its AD for the duration
> of the query. This approach can't scale to
>
> SELECT credit_card FROM users;
>
> because in this case the AD for each row is different, but I'd argue
> that's ideal for this particular case. The client doesn't need to (and
> probably shouldn't) grab everyone's credit card details all at once, so
> there's no reason to optimize for it.
>

Maybe, but it seems like a rather annoying limitation, as it restricts
the client to single-row queries (or at least it looks like that to me).
Yes, it may be fine for some use cases, but I'd bet a DBA who can modify
data can do plenty other things - swapping "old" values, which will have
the right AD, for example.

>>> Unless "transparent" means that the client completely defers to the
>>> server on whether to encrypt or not, and silently goes along with it if
>>> the server tells it not to encrypt?
>> I think that's probably a valid concern - a "bad DBA" could alter the
>> table definition to not contain the "ENCRYPTED" bits, and then peek at
>> the plaintext values.
>>
>> But it's not clear to me how exactly would the AEAD prevent this?
>> Wouldn't that be also specified on the server, somehow? In which case
>> the DBA could just tweak that too, no?
>>
>> In other words, this issue seems mostly orthogonal to the AEAD, and the
>> right solution would be to allow the client to define which columns have
>> to be encrypted (in which case altering the server definition would not
>> be enough).
>
> Right, exactly. When I mentioned AEAD I had assumed that "allow the
> client to define which columns have to be encrypted" was already
> planned or in the works; I just misunderstood pieces of Peter's email.
> It's that piece where a client would probably have to add details
> around AEAD and its use.
>
>>> That would only protect against a
>>> _completely_ passive DBA, like someone reading unencrypted backups,
>>> etc. And that still has a lot of value, certainly. But it seems like
>>> this prototype is very close to a system where the client can reliably
>>> secure data even if the server isn't trustworthy, if that's a use case
>>> you're interested in.
>>
>> Right. IMHO the "passive attacker" is a perfectly fine model for use
>> cases that would be fine with e.g. pgcrypto if there was no risk of
>> leaking plaintext values to logs, system catalogs, etc.
>>
>> If we can improve it to provide (at least some) protection against
>> active attackers, that'd be a nice bonus.
>
> I agree that resistance against offline attacks is a useful step
> forward (it seems to be a strict improvement over pgcrypto). I have a
> feeling that end users will *expect* some protection against online
> attacks too, since an evil DBA is going to be well-positioned to do
> exactly that.
>

Yeah.

>>>> It's probably possible to get something like this (row-level AEAD) by
>>>> encrypting enriched data, i.e. not just the card number, but {user ID,
>>>> card number} or something like that, and verify that in the webapp. The
>>>> problem of course is that the "user ID" is just another column in the
>>>> table, and there's nothing preventing the DBA from modifying that too.
>>>
>>> Right. That's why the client has to be able to choose AD according to
>>> the application. In my previous example, the victim's email address can
>>> be copied by the DBA, but they wouldn't be able to authenticate as that
>>> user and couldn't convince the client to use the plaintext on their
>>> behalf.
>>
>> Well, yeah. But I'm not sure how to make that work easily, because the
>> client may not have the data :-(
>>
>> I was thinking about using a composite data type combining the data with
>> the extra bits - that'd not be all that transparent as it'd require the
>> client to build this manually and then also cross-check it after loading
>> the data. So the user would be responsible for having all the data.
>>
>> But doing that automatically/transparently seems hard, because how would
>> you deal e.g. with SELECT queries reading data through a view or CTE?
>>
>> How would you declare this, either at the client or server?
>
> I'll do some more thinking on the case you're talking about here, where
> pieces of the row are transparently tied together.
>

OK. In any case, I think we shouldn't require this capability from the
get go - it's fine to get the simple version done first, which gives us
privacy / protects against passive attacker. And then sometime in the
future improve this further.

>> Do any other databases have this capability? How do they do it?
>
> BigQuery advertises AEAD support. I don't think their model is the same
> as ours, though; from the docs it looks like it's essentially pgcrypto,
> where you tell the server to encrypt stuff for you.
>

Pretty sure it's server-side. The docs say it's for encryption at rest,
all the examples do the encryption/decryption in SQL, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-12-09 10:05:38 Re: parallel vacuum comments
Previous Message Peter Eisentraut 2021-12-09 09:26:01 Re: Replace uses of deprecated Python module distutils.sysconfig