From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Matthias Apitz <guru(at)unixarea(dot)de>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: \COPY to accept non UTF-8 chars in CHAR columns |
Date: | 2020-03-27 20:40:30 |
Message-ID: | CA+hUKG+BAtBCXaB-0SYxNiVV3_CAbHvm7sm1PJWyhJFvTi_R3A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sat, Mar 28, 2020 at 4:46 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Matthias Apitz <guru(at)unixarea(dot)de> writes:
> > In short, it there a way to let \COPY accept such broken ISO bytes, just
> > complaining about, but not stopping the insert of the row?
>
> No. We don't particularly believe in the utility of invalid data.
>
> If you don't actually care about what encoding your data is in,
> you could use SQL_ASCII as the database "encoding" and thereby
> disable all UTF8-specific behavior. Otherwise, maybe this conversion
> is a good time to clean up the mess?
Something like this approach might be useful for fixing the CSV file:
I haven't tested that program but it looks like the right sort of
approach; I remember writing similar logic to untangle the strange
mixtures of Latin 1, Windows 1252, and UTF-8 that late 90s browsers
used to send. That sort of approach can't fix every theoretical
problem (some valid Latin1 sequences are also valid UTF-8 sequences)
but it's doable with text in European languages.
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Gierth | 2020-03-27 21:58:32 | Re: \COPY to accept non UTF-8 chars in CHAR columns |
Previous Message | Bellrose, Brian | 2020-03-27 20:10:22 | Promoting Hot standby after running select pg_xlog_replay_pause(); |