Quick Links

Re: invalidly encoded strings

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: invalidly encoded strings
Date:	2007-09-10 14:14:34
Message-ID:	3872.1189433674@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> Those should be checked already --- if not, the right fix is still to
>> fix it there, not in per-datatype code. I think we are OK though,
>> eg see "need_transcoding" logic in copy.c.

> Well, a little experimentation shows that we currently are not OK:
> in foo.data:
> \366\66

I looked at this and found that the problem is that
CopyReadAttributesText() does backslash conversions and doesn't validate
the result.

My proposal at this point is that both scan.l and CopyReadAttributesText
need to guarantee that the result of de-escaping is valid in the
database encoding: they should reject \0 and check validity of multibyte
characters. You could optimize this fairly well (important in the COPY
case) by not redoing verifymbstr unless you'd seen at least one
numerical backslash-escape that set the high bit.

regards, tom lane

In response to

Re: invalidly encoded strings at 2007-09-09 17:18:23 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-09-10 14:21:38	Re: Include Lists for Text Search
Previous Message	Tom Lane	2007-09-10 14:04:30	Re: invalidly encoded strings

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Tom Lane	2007-09-10 14:21:38	Re: Include Lists for Text Search
Previous Message	Tom Lane	2007-09-10 14:04:30	Re: invalidly encoded strings