Vacuuming unicode database

From: "Tambet Matiisen" <t(dot)matiisen(at)aprote(dot)ee>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Vacuuming unicode database
Date: 2003-08-13 15:46:11
Message-ID: 81132473206F3A46A72BD6116E1A06AE1B15C8@black.aprote.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


My Postgres databases used to have default (SQL_ASCII) encoding. I could
store any 8-bit character in it regardless of actual charset, because
all clients also used default encoding and no charset conversion was
done.

Now we started a new project with Qt3 and it's Postgres driver defaults
to UNICODE client encoding. This time charset conversion comes to play
and messes up all 8-bit characters. This forces me to create all
databases with correct charsets. So I recreated my database with UNICODE
encoding and imported all data in it, paying attention to client
encoding. I didn't re-initdb whole cluster, only recreated problematic
database. Everything seems to work fine, only I can't vacuum the
database:

epos=# vacuum verbose analyze;
INFO: --Relation pg_catalog.pg_description--
INFO: Pages 18: Changed 0, Empty 0; Tup 1895: Vac 0, Keep 0, UnUsed 5.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: --Relation pg_toast.pg_toast_16416--
INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Analyzing pg_catalog.pg_description
INFO: --Relation pg_catalog.pg_group--
INFO: Pages 1: Changed 0, Empty 0; Tup 31: Vac 0, Keep 0, UnUsed 31.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: --Relation pg_toast.pg_toast_1261--
INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Analyzing pg_catalog.pg_group
ERROR: Invalid UNICODE character sequence found (0xdc6b)

Table pg_group is giving errors, because I have group name with 8-bit
characters in it. As I understand, groups are common for all databases
and pg_group is created during initdb, so it should be considered having
SQL_ASCII charset, not UNICODE. Seems like a bug to me?

What would you suggest in this case:
1. Re-initdb with UNICODE encoding and recreate all databases. Basically
all databases should have the same encoding.
2. Use some single-byte encoding for database, instead of UNICODE.
Vacuuming wouldn't complain any more, but I have some doubts that CREATE
GROUP "Name with 8-bit characters" behaves differently depending on
encoding of the active database.

Tambet

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alexander Rüegg 2003-08-13 16:02:05 Re: Tsearch2 lexeme position
Previous Message Josh Berkus 2003-08-13 15:34:58 Re: Commercial support?