Re: encoding and LC_COLLATE

From: LPlateAndy <andy(at)centremaps(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: encoding and LC_COLLATE
Date: 2011-11-15 16:23:27
Message-ID: 002601cca3b2$7622cc80$62686580$@co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Mark (and Adrian),

As as update i've now found the same data fails on my postgres 8 which doesn't seem to have the LC_COLLATE etc setting and is just UTF-8 so i guess there is possibly just something about the way the data is getting passed in.

This is the error message from postgres 9.0 with the LC_COLLATE as previously described:

===============================================

ERROR: invalid byte sequence for encoding "UTF8": 0xe92922
CONTEXT: COPY pointsofinterest, line 2

********** Error **********

ERROR: invalid byte sequence for encoding "UTF8": 0xe92922
SQL state: 22021
Context: COPY pointsofinterest, line 2

===============================================

This is the error message from the postgres 8.1 with just UTF-8 set:

===============================================

ERROR: invalid UTF-8 byte sequence detected near byte 0xe9
CONTEXT: COPY pointsofinterest, line 2, column street_name: "Near Café)"

===============================================

Does that help? Is there an easy way to check exactly what encoding an existing piece of data is in?

Thanks again for your help so far...

Andy

From: Mark Watson-12 [via PostgreSQL] [mailto:ml-node+s1045698n4992336h40(at)n5(dot)nabble(dot)com]
Sent: 14 November 2011 20:29
To: LPlateAndy
Subject: Re: encoding and LC_COLLATE

De : [hidden email]
[mailto:[hidden email]] De la part de Adrian Klaver
>Envoyé : 14 novembre 2011 13:03
>...
>
>Second is the data coming in actually UTF8 or some other encoding?
>...

Hi Andy,
I have to agree with Adrian in that the data may be coming in under a
different encoding. An e acute is a valid character in 1252 encoding.
However, if the source computer is using, for example, code page 850, an e
acute is hex(82) whereas the equivalent in 1252 is hex(e9). UTF-8 "doesn't
like" hex(82).
HTH,
Mark

--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

_____

If you reply to this email, your message will be added to the discussion below:

http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4992336.html

To unsubscribe from encoding and LC_COLLATE, click here <http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4990415&code=YW5keUBjZW50cmVtYXBzLmNvLnVrfDQ5OTA0MTV8LTE3NDM2MTI2> .
<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> NAML

--
View this message in context: http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4994810.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Broersma 2011-11-15 16:29:34 Re: all non-PK columns from information schema
Previous Message Tarlika Elisabeth Schmitz 2011-11-15 16:00:07 Re: all non-PK columns from information schema