Re: Questions for Meredith

From: "Meredith L(dot) Patterson" <mlp(at)thesmartpolitenerd(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Re: Questions for Meredith
Date: 2007-01-13 06:50:23
Message-ID: 45A8812F.9070407@thesmartpolitenerd.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

Josh Berkus wrote:
> Now that your slides are up, let me give you the list of comments and
> questions from the PUG that you asked for:

Thanks!

I'm actually going to take these one at a time, because I think the
replies are going to run long.

> 1) Suggestion: the query template notation should distinguish between
> literals and non-literals. Currently, you have:
>
> SELECT {col1} FROM table;
> and SELECT '{col}' FROM table;

First, a sidenote: wow, a literal in the target list is legal? Checking
the grammar, apparently so, since target_list => target_el, target_el =>
a_expr, and a_expr => AexprConst (which can be pretty much any literal).

In the first case you're actually not *so* screwed, because the lowest
enclosing node over {col1} is actually ColId. Marking the lowest
enclosing node with an asterisk, a partial parse (of just the targetlist
subtree of the SELECT) looks something like:

simple_select =>
target_list =>
target_el =>
a_expr =>
c_expr =>
columnref =>
relation_name =>
*ColId =>
IDENT

So the only valid elements that can appear here are tokens that have
been lexed as IDENT, or a number of tokens which fall under the
catch-alls of unreserved_keyword and col_name_keyword. By contrast,
literals are lexed as SCONST, FCONST, ICONST, BCONST ... you get the idea.

The place where this *is* profoundly dangerous is in the WHERE clause,
as I'll show below.

> ... with the same notation, allowing an attacker to try to swap a literal
> with an object reference if they can defeat the escaping mechanism,
> especially if the programmer makes makes the mistake of omitting the
> quotes or putting them inside the brackets.

I'm actually going to switch to a different example from the slides,
because today as I was digging back through the lexer I realised I'd
made a mistake in the talk. String constants, aka string literals, are
lexed *with* the quotes around them. So {'foo'} and '{foo}' are actually
equivalent; by extension, the lowest enclosing node around them would be
Sconst, e.g.:

where_clause =>
a_expr =>
c_expr =>
AexprConst =>
*Sconst =>
SCONST

This is where Quinn's suggestion of a sanity-checker function which
shows you what you're *really* allowing to vary will come in handy.

So the good news is that I actually suggested more caution than was
necessary. The bad news is that allowing an attacker to include
comparisons between literals is still dangerous, so detecting this is
still a Good Idea.

However, I did some more thinking about it, and realised that it's also
possible to construct an object-literal comparison -- or merely an
object expression which always returns true. Consider:

SELECT foo FROM bar WHERE exists(select current_database())

Or, without resorting to any built-in functions other than exists():

SELECT foo FROM bar WHERE exists(select relname from pg_class)

The exists() function returns TRUE if the subquery given to it as an
argument returns at least one row, and it is always possible to
construct a subquery which returns at least one row by virtue of being
able to query about the database itself.

So I think it makes more sense to punt back to the more cautious stance
that some of us took: if it's possible for a rule to recurse directly,
i.e., a rule's right-hand side can contain an instance of its own
left-hand side, throw an exception during construction and force the
user to handle that.

This is easily implemented; since Dejector's parser constructs an XML
parse tree, it's trivial to add a "recurses" attribute to all
self-recursive rules, then check to see whether a node which is
potentially recursive appears as (or below) a lowest enclosing node.

More difficult, but something I would like to implement at some point,
is an idea which I think stems from a suggestion Quinn made. It could be
useful to be able to inspect a rule which appears in an exemplar and see
which applications the exemplar allows. Any rule which appears above a
lowest enclosing node will only have one allowable application: the one
which appears in the exemplar. (For example, if your exemplar is a
SELECT statement, then the <stmt> rule can *only* have <SelectStmt> as
its child, provided the entire statement isn't bracketed.) But the rule
at a lowest enclosing node can use *any* of its applications; therein
lies much of the danger with bracketing an <a_expr>.

The bracket notation is convenient, but this all-or-one-or-nothing
nature somewhat betrays the philosophy behind Dejector. If the goal is
to create a restricted sublanguage, then it should be possible to
selectively disallow some rule applications. For instance, if you could
remove the rule

<a_expr> ::= <a_expr> OR <a_expr>

from your sublanguage, you've just removed the "OR 1=1" attack while
still allowing other <a_expr>s into your WHERE clause.

Comments? Criticisms? Anything I'm missing here?

(Questions 2-5 coming up soon.)

--mlp

In response to

Responses

Browse sfpug by date

  From Date Subject
Next Message Quinn Weaver 2007-01-15 01:27:17 Re: Questions for Meredith
Previous Message David Fetter 2007-01-12 23:43:35 Perl Regular Expressions