Quick Links

Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3

From:	Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To:	psycopg(at)lists(dot)postgresql(dot)org, "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject:	Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
Date:	2024-02-16 10:47:29
Message-ID:	Zc89QYhCp0fM6GQB@hermes.hilbert.loc
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	psycopg

Am Thu, Feb 15, 2024 at 11:45:15PM -0600 schrieb Karl O. Pinc:

> Today there is no substitute for knowing the encoding of the
> text your application obtains from the outside world.
> This can be highly system dependent because when reading
> files open()-ed as text, Python decodes (into UTF-8) the bytes read.

Not quite. Python assumes the bytes in the file *are* encoded
by whatever encoding is passed to open(), including, if so
UTF-8). It then decodes said bytes into *unicode code
points*. If we want them back as UTF-8 we need to encode them
as such.

> By default decoding from the system locale's character encoding.
> And when writing files open()-ed as text Python encodes (from UTF-8)

again, from unicode, that is:

https://docs.python.org/3/howto/unicode.html

> No matter how you get your data, to put your data into
> the database as text, its bytes must first have their external
> encoding decoded to UTF-8. Because Python strings are
> UTF-8.

unicode code points, but, yeah

> Once in Python, psycopg converts the UTF-8 text to the database

unicode

> It's important to get the encoding right so I think it'd be
> good to talk about it.

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

In response to

Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3 at 2024-02-16 05:45:15 from Karl O. Pinc

Browse psycopg by date

	From	Date	Subject
Next Message	Jeff Ross	2024-03-21 16:06:03	After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
Previous Message	Karl O. Pinc	2024-02-16 05:45:15	Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3