Re: can we mark upper/lower/textlike functions leakproof?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: can we mark upper/lower/textlike functions leakproof?
Date: 2024-08-01 14:43:54
Message-ID: CA+TgmoZ8ACWegnaJji-z+KOLJuz-M9xSUFYTa4nngd6_T7Hihw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 1, 2024 at 10:05 AM Joe Conway <mail(at)joeconway(dot)com> wrote:
> Sure. Of course you should be monitoring your production servers for
> anomalous workloads, no? "Gee, why is Joe running the same query
> millions of times that keeps throwing errors? Maybe we should go see
> what Joe is up to"

I think it's possible that something like this could be part of some
useful approach, but I think it's difficult. If it would take Joe a
month of pounding on the server to steal enough data to matter, then I
think monitoring could be one required part of an information
protection strategy. However, if Joe, or someone with Joe's
credentials, can steal all the secret data in under an hour, the
monitoring system probably doesn't help much. A human being probably
won't react quickly enough to stop something bad from happening,
especially if the person with Joe's credentials begins the attack at
2am on Christmas.

More generally, I think it's difficult for us to build infrastructure
into PostgreSQL that relies on complex assumptions about what the
customer environment is. To some extent, we are already relying on
users to prevent certain types of attacks. For example, RLS supposes
that timing attacks or plan-shape based attacks won't be feasible, but
we don't do anything to prevent them; we just hope the user takes care
of it. That's already a shaky assumption, because timing attacks could
well be feasible across a fairly deep application stack e.g. the user
issues an HTTP query for a web page and can detect variations in the
underlying database query latency.

When you start proposing assumptions that the user can't execute DDL
or can't execute SQL queries or that there's monitoring of the error
log in place, I feel the whole thing gets very hard to reason about.
First, you have to decide on exactly what the assumptions are - no
DDL, no direct SQL at all, something else? Different situations could
exist for different users, so whatever assumption we make will not
apply to everyone. Second, for some of this stuff, there's a sliding
scale. If we stipulate that a user is going to need a monitoring
system, how good does that monitoring system have to be? What does it
have to catch, and how quickly are the humans required to respond? If
we stipulate that the attacker can't execute SQL directly, how much
control over the generated SQL are they allowed to have?

I don't want to make it sound like I think it's hopeless to come up
with something clever here. The current situation kind of sucks, and I
do have hopes that there are better ideas out there. At the same time,
we need to be able to articulate clearly what we are and are not
guaranteeing and under what set of assumptions, and it doesn't seem
easy to me to come up with something satisfying.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2024-08-01 14:56:40 Official devcontainer config
Previous Message Tom Lane 2024-08-01 14:34:20 Re: Query results vary depending on the plan cache mode used