Quick Links

Re: Decision by Monday: PQescapeString() vs. encoding violation

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Cc:	andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject:	Re: Decision by Monday: PQescapeString() vs. encoding violation
Date:	2025-02-15 20:10:57
Message-ID:	30c84c5fa464f367cbad1d3dc8088342795f96d2.camel@j-davis.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Expanding on the reasoning a bit:

This discussion is only relevant only when the application meets the
following conditions:

A. Sends invalidly-encoded input to an escaping routine. Many
languages protect against this, such as python and rust. But other
languages, like C, Go, and Ruby do not.

B. Uses PQescapeStringConn() and ignores the error, or an escaping
routine that doesn't provide an error at all. (Earlier versions of
PQescapeStringConn ignored invalid sequences in the middle of the
string, but 5dc1e42b4f fixed that.)

C. Does some kind of post-processing to the escaped output to remove
invalidly-encoded data before sending it to the server. If the invalid
data wasn't removed, the server would throw an immediate error anyway.

Given that the situation already involves A, B & C, it's already
somewhat of a special case and I don't consider it a clear-and-present
security threat beyond what was fixed for the CVE. Instead, this is
more about following an accepted practice that is less likely to result
in security problems even if the application code is imperfect.

On Fri, 2025-02-14 at 17:27 -0800, Noah Misch wrote:
> Question for all: would you switch to the "remove fewer bytes"
> behavior in
> next week's releases, switch later, or switch never? Why so?

The reasoning why it's the right change are:

1. It seems reasonable to assume most invalid byte sequences in the
middle of a string usually resulted from some kind of byte splicing or
concatenation. For instance, an application taking untrusted input
bytes, validating only at the byte level, and then concatenating with
other text (which might be done by string interpolation or by applying
a template).

- Given the above assumption, valid initial bytes in a byte sequence
are much more likely to be the start of a new valid character than part
of the previous invalid character.

- If it's the start of a new valid character, it might be an
important delimiter, and removing it could have security consequences,
perhaps not for Postgres itself, but for whatever the application does
with that string (with a missing delimiter) later.

2. The comparable risk on the other side seems substantially lower.
Let's say your application considers 0x7C ('|') to be a very important
delimiter. If the untrusted input contains, e.g. "\xC2\x7C" (invalid
sequence), one could imagine PQescapeStringConn() saving the day by
removing both bytes. But that seems more like an accident than anything
else: the application needs to validate the untrusted input regardless;
otherwise the attacker could just supply a valid '|'.

- Caveat: the application's validation could be broken in such a way
that it skips over multibyte characters without checking for invalid
sequences, i.e. seeing the \xC2 and skipping over the \x7C without
escaping it. Unfortunately, that's exactly what PQescapeStringConn()
did until recently, so this is a reasonable argument in favor of
"remove more bytes". But we shouldn't favor multibyte-half-aware
validators at the cost of breaking byte-by-byte validators.

3. Unicode strongly recommends that we take the "remove fewer bytes"
option. They are in a good position to have an informed opinion about
the security consequences of either choice, so this is partially an
appeal to authority. But that also means that it's the way other string
handling functions are likley to behave, e.g. Ruby's String#scrub seems
to take the "remove fewer bytes" approach. Following convention has its
own security benefits.

Regards,
Jeff Davis

In response to

Decision by Monday: PQescapeString() vs. encoding violation at 2025-02-15 01:27:12 from Noah Misch

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2025-02-15 20:35:45	Re: Decision by Monday: PQescapeString() vs. encoding violation
Previous Message	Andres Freund	2025-02-15 20:09:48	Re: Decision by Monday: PQescapeString() vs. encoding violation