Quick Links

Re: pg_dump, pg_restore and UTF8: invalid byte sequence

From:	<me(at)alternize(dot)com>
To:	<me(at)alternize(dot)com>, <pgsql-novice(at)postgresql(dot)org>
Subject:	Re: pg_dump, pg_restore and UTF8: invalid byte sequence
Date:	2006-10-17 03:23:02
Message-ID:	054401c6f19b$8e5c8210$6501a8c0@iwing
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

> shouldn't pg_dump encode the utf8 bytesequences?

at least i found out why the invalid unicode sequences appear in the first
place: tsearch2 in 8.1 doesn't properly handle utf8 characters: the
character's 2-byte representation is converted to lowercase byte for byte.
for example: "ä" which is encoded as "Ã¤" is written to the db by tsearch2
as "ã¤" which is an invalid utf8 byte sequence.

striping the ts2 index columb before dumping fixes the encoding problems. i
guess the 8.2 -> 8.1.5 backport should fix it as well, i'll try asap.

> also, regarding pg_restore, its quite troubling it has the same
> parameter-set as pg_dump

never mind this, it is too late in the evening 8-)

- thomas

In response to

pg_dump, pg_restore and UTF8: invalid byte sequence at 2006-10-17 01:20:31 from me

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Yadnyesh Joshi	2006-10-17 03:48:29	Inserting arrays from C program
Previous Message	me	2006-10-17 01:20:31	pg_dump, pg_restore and UTF8: invalid byte sequence