From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Duplicate JSON Object Keys |
Date: | 2013-03-13 17:45:43 |
Message-ID: | 5140BB47.7050302@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/13/2013 12:51 PM, Gavin Flower wrote:
> On 14/03/13 02:02, Andrew Dunstan wrote:
>>
>> On 03/13/2013 08:17 AM, Robert Haas wrote:
>>> On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>> wrote:
>>>>> So my order of preference for the options would be:
>>>>>
>>>>> 1. Have the JSON type collapse objects so the last instance of a
>>>>> key wins
>>>>> and is actually stored
>>>>>
>>>>> 2. Throw an error when a JSON type has duplicate keys
>>>>>
>>>>> 3. Have the accessors find the last instance of a key and return that
>>>>> value
>>>>>
>>>>> 4. Let things remain as they are now
>>>>>
>>>>> On second though, I don't like 4 at all. It means that the JSON type
>>>>> things a value is valid while the accessor does not. They
>>>>> contradict one
>>>>> another.
>>>> You can forget 1. We are not going to have the parser collapse
>>>> anything.
>>>> Either the JSON it gets is valid or it's not. But the parser isn't
>>>> going to
>>>> try to MAKE it valid.
>>> Why not? Because it's the wrong thing to do, or because it would be
>>> slower?
>>>
>>> What I think is tricky here is that there's more than one way to
>>> conceptualize what the JSON data type really is. Is it a key-value
>>> store of sorts, or just a way to store text values that meet certain
>>> minimalist syntactic criteria? I had imagined it as the latter, in
>>> which case normalization isn't sensible. But if you think of it the
>>> first way, then normalization is not only sensible, but almost
>>> obligatory. For example, we don't feel bad about this:
>>>
>>> rhaas=# select '1e1'::numeric;
>>> numeric
>>> ---------
>>> 10
>>> (1 row)
>>>
>>> I think Andrew and I had envisioned this as basically a text data type
>>> that enforces some syntax checking on its input, hence the current
>>> design. But I'm not sure that's the ONLY sensible design.
>>>
>>
>>
>> I think we've moved on from this point, because a) other
>> implementations allow duplicate keys, b) it's trivially easy to make
>> Postgres generate such json, and c) there is some dispute about
>> exactly what the spec mandates.
>>
>> I'll be posting a revised patch shortly that doesn't error out but
>> simply uses the value for the later key lexically.
>>
>> cheers
>>
>> andrew
>>
>>
>>
>>
> How about adding a new function with '_strict' added to the existing
> name, with an extra parameter 'coalesce' - or using other names, if
> considered more appropriate!
>
> That way slower more stringent functionality can be added where
> required. This way, the existing function need not be changed.
>
> If coalesce = true,
> then: the last duplicate is used
> else: an error is returned when the new key is a duplicate.
>
>
>
For good or ill, we now already have a json type that will accept
strings with duplicate keys, and generator functions which can now
generate such strings. If someone wants functions to enforce a stricter
validity check (e.g. via a check constraint on a domain), or to convert
json to a canonical version which strips out prior keys of the same name
and their associated values, then these should be relatively simple to
implement given the parser API in the current patch. But they aren't
part of the current patch, and I think it's way too late to be adding
such things. I have been persuaded by arguments made upthread that the
best thing to do is exactly what other well known json-accepting
implementations do (e.g. V8), which is to accept json with duplicate
keys and to treat the later key/value as overriding the former
key/value. If I'd done that from the start nobody would now be talking
about this at all.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | David E. Wheeler | 2013-03-13 17:50:28 | Re: Duplicate JSON Object Keys |
Previous Message | Josh Berkus | 2013-03-13 17:09:45 | Re: Enabling Checksums |