From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Markus Wollny <Markus(dot)Wollny(at)computec(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: invalid UTF-8 byte sequence detected |
Date: | 2005-11-15 18:46:02 |
Message-ID: | 200511151846.jAFIk2F19608@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I am also confused how invalid UTF8 sequences got into your database.
It shouldn't have been possible.
---------------------------------------------------------------------------
Markus Wollny wrote:
> Hello!
>
> I am currently testdriving migration of our PostgreSQL 8.0 databases to 8.1; in this process I have stumbled a couple of times over certain errors in text-fields that lead to error-messages during import of the dump like these:
>
> <2005-11-09 14:57:34 CET - 9354: [local](at)community_unicode>ERROR: invalid UTF-8 byte sequence detected near byte 0xb4
> <2005-11-09 14:57:34 CET - 9354: [local](at)community_unicode>CONTEXT: COPY board_message, line 1125662, column text: "HI
>
> Besteht ein gewisser Nachteil, wenn ich nur eins von den beiden kaufe, da in beiden Addon?s viel..."
> <2005-11-09 14:57:34 CET - 9354: [local](at)community_unicode>STATEMENT: COPY board_message (board_id, thread_id, father_id, message_id, user_id, title, signat
> ure, follow_up, count_reply, last_reply, created, article_id, logged_ip, state_id, user_login, user_status, user_rank, user_rank_description, user_rank_picture, user_rights, text, deleted_user_id, deleted_date, deleted_login, user_created, poll_id, idxfti) FROM stdin;
>
> <2005-11-09 14:57:49 CET - 9354: [local](at)community_unicode>ERROR: invalid UTF-8 byte sequence detected near byte 0x98
> <2005-11-09 14:57:49 CET - 9354: [local](at)community_unicode>CONTEXT: COPY kidszone_tournament2005_user, line 427, column phone: "02302?74"
> <2005-11-09 14:57:49 CET - 9354: [local](at)community_unicode>STATEMENT: COPY kidszone_tournament2005_user (id, first_name, last_name, adress, birthday, phone,
> email, place, permission, ude, ude_box, invited) FROM stdin;
>
> There are not too many occurrences of the same type - five altogether in a 1.8GB compressed dumpfile, but still it has me worried and leaves me with some questions:
>
> 1.) How could I have prevented insertion of these invalid byte-sequences in the first place? We're using UTF-8 encoded databases, data is mostly inserted by users via browser applications, our websites are UTF-8 encoded, too, but still we cannot really make 100% sure that all clients behave as expected; on the other hand, it would be extremely inconvenient if we had to check each and every text input for character set conformance in the application, so is there a way to ascertain "sane" data via some database-setting? pg_restore does throw this error and indeed terminates after that (I used custom dump format for pg_dump), psql on the other hand just continues with the import (using a pgdumpall-output that generates a standard SQL-script), although it too throws the error.
>
> 2.) How does this really affect the value of the database-dumps? psql continues with import after the error, but the table where this error occurred remains empty, as the affected COPY-statement has failed altogether due to this error. So a plain no-worries import in my case would present me a result with five tables empty - one of them quite large... Is there some kind of magic, maybe involving some perl or whatever, that could help to clean up the dump before the import, so I can accomplish a full restore?
>
> Kind regards,
>
> Markus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | john.bender | 2005-11-15 18:51:49 | Customizing the Windows installer |
Previous Message | Jerry LeVan | 2005-11-15 18:44:32 | Cursors or Offset, Limit? |