Quick Links

Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes

From:	Aaron Botsis <aaron(at)bt-r(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Francisco Olarte <folarte(at)peoplecall(dot)com>
Subject:	Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
Date:	2015-01-09 01:42:00
Message-ID:	E4532EA7-503E-4E18-917A-97DA12C4E82B@bt-r.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

> On Jan 8, 2015, at 3:44 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 01/08/2015 03:05 PM, Aaron Botsis wrote:
>>
>>
>>> It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this:
>>>
>>> copy the_table(jsonfield)
>>> from '/path/to/jsondata'
>>> csv quote e'\x01' delimiter e'\x02’;
>>
>> While perhaps unnecessary, given the size and simplicity of the patch, IMO it’s a no brainer to merge (it actually makes the code smaller by 3 lines). It also enables non-json use cases anytime one might want to preserve embedded escapes, or use different ones entirely. Do you see other reasons not to commit it?
>
>
> Well, for one thing it's seriously incomplete. You need to be able to change the delimiter as well. Otherwise, any embedded tab in the json will cause you major grief.
>
> Currently the delimiter and the escape MUST be a single byte non-nul character, and there is a check for this in csv mode. Your patch would allow any arbitrary string (including one of zero length) for the escape in text mode, and would then silently ignore all but the first byte. That's not the way we like to do things.
>
> And, frankly, I would need to spend quite a lot more time thinking about other implications than I have given it so far. This is an area where I tend to be VERY cautious about making changes. This is a fairly fragile ecosystem.

Good point.

This version:

* doesn't allow ENCODING in BINARY mode (existing bug)
* doesn’t allow ESCAPE in BINARY mode
* makes COPY TO work with escape
* ensures escape char length is < 2 for text mode, 1 for CSV

Couple more things to realize: setting both the escape and delimiter characters to null won’t be any different than how you fiddled with them in CSV mode. The code paths will be the same because we should never encounter a null in the middle of the string. And even if we did (and the encoding didn’t catch it), we’d treat it just like any other field delimiter or escape character.

So I’m going to do a bit more testing with another patch tomorrow with delimiters removed. If you can think of any specific cases you think will break it let me know and I’ll make sure to add regression tests for them as well.

Cheers!

Aaron

Attachment	Content-Type	Size
escape-without-csv-v4.patch	application/octet-stream	5.0 KB
unknown_filename	text/plain	3 bytes

In response to

Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes at 2015-01-08 20:44:44 from Andrew Dunstan

Responses

Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes at 2015-01-09 16:37:19 from Andrew Dunstan
Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes at 2015-01-14 14:05:52 from Robert Haas

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Noah Misch	2015-01-09 05:58:19	Re: Re: BUG #11617: issue with dump/restore involving view with hstore data type embedded in where condition
Previous Message	jeff.casavant	2015-01-08 21:24:29	BUG #12465: Materialized view dump restoration issue

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Vladimir Koković	2015-01-09 03:59:56	Fwd: Re: make check-world regress failed
Previous Message	Stephen Frost	2015-01-09 00:49:24	Re: INSERT ... ON CONFLICT UPDATE and RLS