Re: copying json data and backslashes

From: Erik Wienhold <ewie(at)ewie(dot)name>
To: Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: copying json data and backslashes
Date: 2022-11-22 15:15:57
Message-ID: 1342743333.378397.1669130157980@office.mailbox.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On 22/11/2022 15:23 CET Alastair McKinley <a(dot)mckinley(at)analyticsengines(dot)com> wrote:
>
> Hi all,
>
> I have come across this apparently common issue COPY-ing json and wondering if
> there is potentially a better solution.
>
> I am copying data into a jsonb column originating from a 3rd party API. The
> data may have literal \r,\t,\n and also double backslashes.
>
> I discovered that I can cast this data to a jsonb value directly but I can't
> COPY the data without pre-processing.
>
> The example below illustrates my issue (only with \r, but the problem extends
> to other \X combinations).
>
> > do $$
> > lines=[r'{"test" : "\r this data has a carriage return"}']
> >
> > with open("/tmp/test1.json","w") as f:
> > for line in lines:
> > f.write(line.strip() + "\n")
> >
> > $$ language plpython3u;
> >
> > create temp table testing (data jsonb);
> >
> > -- this works
> > insert into testing (data)
> > select l::jsonb
> > from pg_read_file('/tmp/test1.json') f,
> > lateral regexp_split_to_table(f,'\n') l where l <> '';
> >
> > -- fails
> > copy testing (data) from '/tmp/test1.json';
> >
> > -- works
> > copy testing (data) from program $c$ sed -e 's/\\r/\\\\u000a/g' /tmp/test1.json $c$;
> >
>
> Is there any other solution with COPY that doesn't require manual
> implementation of search/replace to handle these edge cases?
> Why does ::jsonb work but COPY doesn't? It seems a bit inconsistent.

COPY handles special backslash sequences[1]. The \r in your sample JSON,
although properly escaped according to JSON, is replaced with an actual
carriage return by COPY before casting to jsonb. The error results from JSON
prohibiting unescaped control characters in strings[2].

You must double escape to pass those characters through COPY.

See how COPY outputs backslash sequences:

-- Actual carriage return:
copy (select e'\r') to stdout;
\r

-- Backslash sequence for carriage return:
copy (select '\r') to stdout;
\\r

[1] https://www.postgresql.org/docs/current/sql-copy.html#id-1.9.3.55.9.2
[2] https://www.json.org/json-en.html

--
Erik

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message pbj@cmicdo.com 2022-11-22 15:30:46 Re: copying json data and backslashes
Previous Message Alastair McKinley 2022-11-22 14:23:16 copying json data and backslashes