Re: sandboxing untrusted code

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: sandboxing untrusted code
Date: 2023-09-05 16:25:28
Message-ID: CA+TgmobN===Gp+jRw6yt8wLM7xB=Z+mAVtUeoy6Uthwe00-dpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 1, 2023 at 5:27 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Which privileges are available in a sandboxed environment, exactly? Is
> it kind of like masking away all privileges except EXECUTE, or are
> other privileges available, like SELECT?

I think I've more or less answered this already -- fully sandboxed
code can't make reference to external data sources, from which it
follows that it can't exercise SELECT (and most other privileges).

> And the distinction that you are drawing between having the privileges
> but them (mostly) not being available, versus not having the privileges
> at all, is fairly subtle. Some examples showing why that distinction is
> important would be helpful.

I view it like this: when Bob tries to insert or update or delete
Alice's table, and Alice has some code attached to it, Alice is
effectively asking Bob to execute that code with his own privileges.
In general, I think we can reasonably expect that Bob WILL be willing
to do this: if he didn't want to modify into Alice's table, he
wouldn't have executed a DML statement against it, and executing the
code that Alice has attached to that table is a precondition of being
allowed to perform that modification. It's Alice's table and she gets
to set the rules. However, Bob is also allowed to protect himself. If
he's running Alice's code and it wants to do something with which Bob
isn't comfortable, he can change his mind and refuse to execute it
after all.

I always find it helpful to consider real world examples with similar
characteristics. Let's say that Bob is renting a VRBO from Alice.
Alice leaves behind, in the VRBO, a set of rules which Bob must follow
as a condition of being allowed to rent the VRBO. Those rules include
things that Bob but must do at checkout time, like washing all of his
dishes. As a matter of routine, Bob will follow Alice's checkout
instructions. But if Alice includes in the checkout instructions
"Leave your driver's license and social security card on the dining
room table after checkout, plus a record of all of your bank account
numbers," the security systems in Bob's brain should activate and
prevent those instructions from getting followed.

A major difference between that situation (a short term rental of
someone else's house) and the in-database case (a DML statement
against someone else's table) is that when Bob is following Alice's
VRBO checkout instructions, he knows exactly what actions he is
performing. When he executes a DML statement against Alice's table,
Bob the human being does not actually know what Alice's triggers or
index expressions or whatever are causing him to do. As I see it, the
purpose of this system is to prevent Bob from doing things that he
didn't intend to do. He's cool with adding 2 and 2 or concatenating
some strings or whatever, but probably not with reading data and
handing it over to Alice, and definitely not handing all of his
privileges over to Alice. Full sandboxing has to block that kind of
stuff, and it needs to do so precisely because *Bob would not allow
those operations if he knew about them*.

Now, it is not going to be possible to get that perfectly right.
PostgreSQL can not know the state of Bob's human mind, and it cannot
be expected to judge with perfect accuracy what actions Bob would or
would not approve. However, it can make some conservative guesses. If
Bob wants to override those guesses by saying "I trust Alice, do
whatever she says" that's fine. This system attempts to prevent Bob
from accidentally giving away his permissions to an adversary who has
buried malicious code in some unexpected place. But, unlike the
regular permissions system, it is not there to prevent Bob from doing
things that he isn't allowed to do. It's there to prevent Bob from
doing things that he didn't intend to do.

And that's where I see the distinction between *having* permissions
and those permissions being *available* in a particular context. Bob
has permission to give Alice an extra $1000 or whatever if he has the
money and wishes to do so. But those permissions are probably not
*available* in the context where Bob is following a set of
instructions from Alice. If Bob's brain spontaneously generated the
idea "let's give Alice a $1000 tip because her vacation home was
absolutely amazing and I am quite rich," he would probably go right
ahead and act on that idea and that is completely fine. But when Bob
encounters that same idea *on a list of instructions provided by
Alice*, the same operation is blocked *because it came from Alice*. If
the list of instructions from Alice said to sweep the parlor, Bob
would just go ahead and do it. Alice has permission to induce Bob to
sweep the parlor, but does not have permission to induce Bob to give
her a bunch of extra money.

And in the database context, I think it's fine if Alice induces Bob to
compute some values or look at the value of work_mem, but I don't
think it's OK if Alice induces Bob to make her a superuser. Unless Bob
declares that he trusts Alice completely, in which case it's fine if
she does that.

> Here I'm getting a little lost in what you mean by "prohibited
> operation". Most languages mostly use SPI, and whatever sandboxing
> checks you do should work there, too. Are you talking about completely
> separate side effects like writing files or opening sockets?

Yeah.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-09-05 16:46:16 Re: How to add a new pg oid?
Previous Message Alvaro Herrera 2023-09-05 16:24:37 Re: information_schema and not-null constraints