Re: Bug in UTF8-Validation Code?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Fuhr <mike(at)fuhr(dot)org>, Mario Weilguni <mweilguni(at)sime(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-03-17 17:51:02
Message-ID: 12557.1174153862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Actually, I have to take back that objection: on closer look, COPY
> validates the data only once and does so before applying its own
> backslash-escaping rules. So there is a risk in that path too.

> It's still pretty annoying to be validating the data twice in the
> common case where no backslash reduction occurred, but I'm not sure
> I see any good way to avoid it.

Further thought here: if we put encoding verification into textin()
and related functions, could we *remove* it from COPY IN, in the common
case where client and server encodings are the same? Currently, copy.c
forces a trip through pg_client_to_server for multibyte encodings
even when the encodings are the same, so as to perform validation.
But I'm wondering whether we'd still need that. There's no risk of
SQL injection in COPY data. Bogus input encoding could possibly
make for confusion about where the field boundaries are, but bad
data is bad data in any case.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2007-03-17 17:51:11 Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
Previous Message Pavan Deolasee 2007-03-17 17:41:05 Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)