strange behaviour of extended ascii characters (LATIN1)

From: Ennio Iannucci <Ennio-Sr(at)ei(dot)hnet>
To: pgsql-bugs(at)postgresql(dot)org
Subject: strange behaviour of extended ascii characters (LATIN1)
Date: 2003-03-14 12:12:27
Message-ID: 20030314121227.GA807@deby.ei.hnet
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

To pgsql-bugs(at)postgresql(dot)org

Name: Ennio Iannucci
Email Address:eiannucci(at)tin(dot)it
Severity: Minor Annoyance (but great curiosity)
Short Description: special characters (beyond ascii char(127)) seem to
behave ina very strange way in a postgresql table.

THE FACTS:
=========
I created a pg table from a *.txt file coming from a *.dbf file (MS-DOS
codepages 437 and/or 850) containing accented vowels and other
characters beyond ascii char(127).

The psql command:
mydb=> select autore, titolo, collana from bibl where autore like
'%Tour%'; displays the special characters correctly, whereas:
mydb=> select * from bibl;
substitutes the hex char code for the character gliph, that is to say:
you get a <85> instead of char(133) [a with accent].
Moreover, when the expanded display is toggled on (\x), both the above
commands produce the same effect, i.e. <hex char codes> instead of
proper letter gliphs.

mydb=>update bibl set titolo='Rom[char(133)]' where titolo like '%Rom%';
will produce Rom[char(183)], i.e. a capital A with an accent! But,
repeating once again the same command, the result would display
correctly, i.e. the gliph for char(133). However, a new select given
after setting extended display on, will show again the accented capital
A (char(183)) instead of the correct char(133).

It may be of some interest to note that:
- I can type the desired special characters using ALT+dec.code from the
console in or outside psql (and they display correctly).
- less [| cat] orig_file.txt will show hex code
- vim orig_file.txt will display ~E (tilde+E) instead of char(133).

THESE ARE THE VARIOUS STEPS I MADE:
==================================
# su - postgres

AND AS POSTGRES:
rm -r /var/lib/postgres/data
mkdir /var/lib/postgres/data
initdb -D /var/lib/postgres/data -E latin1

THE SUGGESTED COMMAND TO START POSTGRES:
/usr/lib/postgresql/bin/postmaster -D /var/lib/postgres/data
HANGED AFTER PRINTING THESE LINES:
FindExec: found "/usr/lib/postgresql/bin/postgres" using argv[0]
invoking IpcMemoryCreate(size=1949696)
FindExec: found "/usr/lib/postgresql/bin/postmaster" using argv[0]

SO I STARTED IT WITH THE COMMAND (FROM ROOT):
# /etc/init.d/postgresql start

THEN, BACK AS POSTGRES:
$ psql template1
template1=> create user john with password 'john' createdb;
\q

AND, TO CHECK WHETER OR NOT I HAD THE RIGHT CHARACTER SET:
$ /usr/lib/postgresql/bin/pg_controldata /var/lib/postgres/data
pg_control version number: 71
Catalog version number: 200201121
Database state: IN_PRODUCTION
pg_control last modified: Thu Mar 13 23:27:41 2003
Current log file id: 0
Next log file segment: 1
Latest checkpoint location: 0/2070AC
Prior checkpoint location: 0/1FA424
Latest checkpoint's REDO location: 0/2070AC
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's StartUpID: 11
Latest checkpoint's NextXID: 124
Latest checkpoint's NextOID: 24748
Time of latest checkpoint: Thu Mar 13 23:27:39 2003
Database block size: 8192
Blocks per segment of large relation: 131072
LC_COLLATE: it_IT
LC_CTYPE: it_IT

I BECAME USER JOHN AND AS SUCH:
$ psql template1
template1=> create database mydb;
template1=> \c mydb
mydb=> \i bibl_import.sql # to create the table bibl from a *.txt file
mydb=> \i bibl_crea.sql # to get rid of columns containing delimiters

THIS IS THE VERSION I'M USING:
$ psql -V
psql (PostgreSQL) 7.2.1 # pre-packaged, installed with 'apt-get install'
from Debian/Woody
contains support for: readline, history, multibyte
Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
Portions Copyright (c) 1996, Regents of the University of California
Read the file COPYRIGHT or use the command \copyright to see the
usage and distribution terms.

AND THESE ARE A FEW LINES FROM THE LOG FILE AFTER STARTING POSTGRES:
/usr/lib/postgresql/bin/postmaster child[2070]: starting with (postgres
-d2 -v131072 -p template1 )
/usr/lib/postgresql/bin/postmaster child[2078]: starting with (postgres
-d2 -v131072 -p template1 )
/usr/lib/postgresql/bin/postmaster child[2081]: starting with (postgres
-d2 -v131072 -p mydb )

ARCHITECTURE:
============
Linux version 2.2.19pre17 (herbert(at)arnor) (gcc version 2.7.2.3) #1 Tue
Mar 13 22:37:59 EST 2001 Detected 167047 kHz processor.
Console: colour VGA+ 80x25
Memory: 29784k/32768k available (1744k kernel code, 412k reserved, 672k
data, 156k init) CPU: Intel Pentium MMX stepping 03
PCI: PCI BIOS revision 2.10 entry at 0xfb730
Adding Swap: 192772k swap-space (priority -1)
Linux ...... 2.2.19pre17 #1 Tue Mar 13 22:37:59 EST 2001 i586 unknown

VERSION:
=======
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4

I hope the above information will be of some help to reproduce the
reported behaviour or to let you discover where I am wrong.
Thank you for your assistance and for having produced such an intriguing
program and let me apologize in advance, just in case all depends on my
ignorance!
Best regards,
Ennio Iannucci.

Browse pgsql-bugs by date

  From Date Subject
Next Message pgsql-bugs 2003-03-14 16:26:35 Bug #909: Website docs broken
Previous Message Oliver Elphick 2003-03-14 10:22:27 Re: now() and date_trunc() apparently corrupting data