Re: Why the index is not used ?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: ROS Didier <didier(dot)ros(at)edf(dot)fr>, "folarte(at)peoplecall(dot)com" <folarte(at)peoplecall(dot)com>
Cc: "pavel(dot)stehule(at)gmail(dot)com" <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-sql(at)lists(dot)postgresql(dot)org" <pgsql-sql(at)lists(dot)postgresql(dot)org>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Why the index is not used ?
Date: 2018-10-07 20:07:44
Message-ID: 88005798-8b90-3535-cc1c-a915a9e12d17@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance pgsql-sql

Hi,

On 10/07/2018 08:32 PM, ROS Didier wrote:
> Hi Francisco
>
> Thank you for your remark.
> You're right, but it's the only procedure I found to make search on
> encrypted fields with good response times (using index) !
>

Unfortunately, that kinda invalidates the whole purpose of in-database
encryption - you'll have encrypted on-disk data in one place, and then
plaintext right next to it. If you're dealing with credit card numbers,
then you presumably care about PCI DSS, and this is likely a direct
violation of that.

> Regarding access to the file system, our servers are in protected
network areas. few people can connect to it.
>

Then why do you need encryption at all? If you assume access to the
filesystem / storage is protected, why do you bother with encryption?
What is your threat model?

> it's not the best solution, but we have data encryption needs and
> good performance needs too. I do not know how to do it except the
> specified procedure..
>
> if anyone has any proposals to put this in place, I'm interested.
>

One thing you could do is hashing the value and then searching by the
hash. So aside from having the encrypted column you'll also have a short
hash, and you may use it in the query *together* with the original
condition. It does not need to be unique (in fact it should not be to
make it impossible to reverse the hash), but it needs to have enough
distinct values to make the index efficient. Say, 10k values should be
enough, because that means 0.01% selectivity.

So the function might look like this, for example:

CREATE FUNCTION cchash(text) RETURNS int AS $$
SELECT abs(hashtext($1)) % 10000;
$$ LANGUAGE sql;

and then be used like this:

CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cchash(cc));

and in the query

SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit
WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32'
AND cchash(cc) = cchash('test value 32');

Obviously, this does not really solve the issues with having to pass the
password to the query, making it visible in pg_stat_activity, various
logs etc.

Which is why people generally use FDE for the whole disk, which is
transparent and provides the same level of protection.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paul McGarry 2018-10-07 22:10:30 Re: Why the index is not used ?
Previous Message Vladimir Ryabtsev 2018-10-07 19:32:46 Re: Why the index is not used ?

Browse pgsql-performance by date

  From Date Subject
Next Message Paul McGarry 2018-10-07 22:10:30 Re: Why the index is not used ?
Previous Message Vladimir Ryabtsev 2018-10-07 19:32:46 Re: Why the index is not used ?

Browse pgsql-sql by date

  From Date Subject
Next Message Paul McGarry 2018-10-07 22:10:30 Re: Why the index is not used ?
Previous Message Vladimir Ryabtsev 2018-10-07 19:32:46 Re: Why the index is not used ?