From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thom Brown <thom(at)linux(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Column Redaction |
Date: | 2014-10-15 19:41:24 |
Message-ID: | CAGTBQpb1y2gDD2j5MVzdC9L9Ee6jvKwE+JuPDSU+MVr+ePP-qA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 10 October 2014 16:45, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
> Redaction prevents accidental information loss only, forcing any loss
> that occurs to be explicit. It ensures that loss of information can be
> tied clearly back to an individual, like an ink packet that stains the
> fingers of a thief.
That is not true.
It can only be tied to a session. That's very far from an individual
in court terms, if you ask a lawyer.
You need a helluva lot more to tie that to an individual.
> Redaction clearly relies completely on auditing before it can have any
> additional effect. And the effectiveness of redaction needs to be
> understood next to Rod's example.
It forces you to audit all of the queries issued by the otherwise trusted user.
That is, I believe, a far from optimal design. When you have to audit
everything, you end up auditing nothing, a haystack of false positives
can easily hide the needle that is the true positive.
What you want, is something that allows selective auditing of
leak-prone queries.
But we've seen that joining is already a leak-prone query, so clearly
you cannot allow simple joining if you want the above.
What I propose, needs a schema change and some preparedness from the
DBA. But, how can you assume that to be asking too much and not say
the same from thorough auditing?
So, what I propose, is to require explicit separation of concepts at
the schema level.
On Sat, Oct 11, 2014 at 10:43 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> For example, for a credit card type, you would output the last four
> digits, but is there any value to storing the non-visible digits? You
> can check the checksum of the digits, but that can be done on input and
> doesn't require the storage of the digits. Is there some function we
> could provide that would make that data type useful? Could we provide
> comparison functions with delays or increasing delays?
Basically, as said above, the point is to provide a data type that is
nigh-useless.
Imagine a redacted card number as a tuple (full_value_id, suffix).
Suffix is in cleartext, and prefix_id is just an id pointing to a
lookup table for the type.
Regular users can read any redacted_number column, but will only get
the id (useless unless they already know what that prefix is), and
suffix. Format for that type would be "**** suffix" and would serve
the purpose on the OP: it can be joined (equal value = equal id).
Moreover, the type can be design in one of two ways: equal values
contain equal id, or salted-values, where even equal values generated
from different computations (ie: not copied) have different ids. This
second mode would be the most secure, albeit a tad hard to use
perhaps.
But it would allow joining and everything. Only users that have access
to the lookup table would be allowed to resolve the full value, with a
non-security-defining function like:
extract_full_value(redacted_number)
Then you can audit all queries against the lookup table, and you have
rather strong security IMHO.
This can all be done without any new features to postgres. Maybe you
can add syntactic sugar, but you don't really need anything on the
core to accomplish the above.
The syntactic sugar can take the form of a new data type family (like
enum?) where you specify the redaction function, redacted data type,
output format, and from there everything else works atomagically, with
a
extract_full(any) -> any
function that somehow knows what to do.
On Wed, Oct 15, 2014 at 3:57 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 15 October 2014 19:46, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>>> In IT terms, we're looking at controlling and reducing improper access
>>> to data by an otherwise Trusted person. The only problem is that some
>>> actions on data items are allowed, others are not.
>>
>> Sure, I don't disagree with any of that as a general principle. I
>> just think we should look for some ways of shoring up your proposal
>> against some of the more obvious attacks, so as to have more good and
>> less bad.
>
> Suggestions welcome. I'm not in a rush to implement this, so we have
> time to mull it over.
Does the above work for your intended purposes?
Hard to know from what you've posted until now, but I believe it does.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-10-15 19:46:43 | Re: replicating DROP commands across servers |
Previous Message | Tom Lane | 2014-10-15 19:41:18 | Re: Proposal for better support of time-varying timezone abbreviations |