From: | Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | magnus(at)hagander(dot)net |
Subject: | JSON manipulation functions |
Date: | 2010-05-14 01:47:37 |
Message-ID: | AANLkTimyZqOqhOpFWv8WiGWVM8eHIaMN9AUTeugvrMtL@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
First off, thank you for allowing me to participate in Google Summer
of Code 2010. I'm sorry I haven't been active for the past few weeks.
Today, I added the wiki page for my project, but the project schedule
is highly tentative:
http://wiki.postgresql.org/wiki/JSON_datatype_GSoC_2010 .
I'd like to discuss how the functions for type checking and conversion
(the majority of my project) should work. Below is my first draft for
the JSON manipulation function specs, along with annotations.
== Type checking ==
The following function returns the type of any JSON value.
json_type as enum ('null', 'string', 'number', 'boolean', 'object', 'array')
json_type(json) returns json_type
Would it be a bad idea to give an enum and a function the same name
(which appears to be allowed by PostgreSQL) ? If so, json_type(json)
could be json_typeof(json) or something instead.
I thought about having predicates like IS STRING and IS NUMBER,
similar to the IS DOCUMENT predicate used for XML. However, a major
problem with that approach is that it could lead to confusion
involving IS NULL. By my understanding, the JSON datatype will just
be a specialization of TEXT (it just validates the input). Like TEXT,
a JSON string can be 'null'. 'null'::JSON is not NULL. Bear in mind
that json_to_*('null') is NULL, though.
I also thought about having a series of json_is_* functions. I don't
think it's a bad idea, but I think json_type is a better solution.
== text/number/boolean conversion ==
These functions each convert a non-compound JSON value to its
respective return type. Run-time type checking is performed; a
conversion will throw an error if the input JSON is not the correct
type. If the JSON value is 'null', then the return value will be
NULL.
json_to_text(json) returns text
json_to_number(json) returns numeric
json_to_bool(json) returns boolean
These functions convert values to JSON. Passing NULL to any of the
functions below will return 'null':
text_to_json(text) returns json
number_to_json(numeric) returns json
bool_to_json(boolean) returns json
There could be generic value_to_json(any), but not a
json_to_value(json) function. See
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00321.php for
more details.
Conversion to/from number or boolean can also be achieved with
casting. Note well that '"string"'::JSON::TEXT is '"string"', not the
string's actual value. json_to_text is needed for this conversion.
For this reason, casting JSON might seem like something to recommend
against. However, IMHO, casting numbers and booleans to/from JSON is
fine and dandy; the paragraphs below give some weight to this.
I originally considered making json_to_number and number_to_json work
with TEXT instead of NUMERIC. However, as Tom Lane pointed out in the
above link, "Forcing people to insert explicit coercions from text
isn't going to be particularly convenient to use.". Nevertheless,
NUMERIC introduces a problem. For instance, if you say:
SELECT '-1e-38'::NUMERIC;
This conversion knocks out the scientific notation and produces a
41-character string. I seriously doubt that all outside applications
will handle 41-character numbers correctly.
Perhaps there should be individual functions for specific data types,
or maybe just a handful for particular cases. There might be
json_to_int, json_to_float, and json_to_numeric. In any case,
converting to/from number types can be achieved quite easily with
casting.
== array/object conversion ==
The json_object function converts a tuple to a JSON object. If there
are duplicate column names, there will be duplicate keys in the
resulting JSON object.
json_object([content [AS name] [, ...]]) returns json
Likewise, the json_array function converts a tuple to a JSON array.
Column names are ignored.
json_array([content [AS name] [, ...]]) returns json
The json_agg function reduces a set of JSON values to a single array
containing those values.
aggregate json_agg(json) returns json
json_object and json_agg can be used together to convert an entire
result set to one JSON array:
SELECT json_agg(json_object(*)) FROM tablename;
json_keys gets the keys of a JSON object as a set.
json_keys(json) returns setof text
json_values gets the values of a JSON object or the iems of a JSON
array as a set.
json_values(json) returns setof json
Note that all JSON slicing and splicing operations retain the original
formatting of JSON content.
== Miscellaneous ==
The features below would be nice, but will probably not be regarded as
required for this Google Summer of Code project to be considered
complete.
json_cleanup accepts a superset of JSON and, if it can, cleans it up
and returns a valid JSON string. This superset of JSON supports the
following extra features:
* Comments:
- Single-line comments with // and #
- C-style comments: /* comment */
* Unquoted object keys: {key: "value"}
* Single quote strings: 'single quotes; "double quotes" do not need
to be escaped here'
* Single quote escape allowed: "It\'s allowed, but it's not necessary"
* Lax number format (+ sign allowed; digits may be omitted on one
side of the decimal point).
json_cleanup(text) returns json
Example:
SELECT json_cleanup('{/*comment*/number: +.3}');
-- Result is '{"number": 0.3}'::JSON
-> retrieves an item of a JSON object by key. If the object has
duplicate keys, the first key listed will be retrieved. Example:
SELECT ('{"foo": 50, "foo": 100, "bar": "string"}'::JSON)->"foo"
-- Result is '50'::JSON
[] retrieves a value of a JSON array/object by (one-based) index. In
other words, value[n] is equivalent to selecting the nth row of
json_values(value) (provided value is of type JSON). Examples:
SELECT ('[1,2,3,4]'::JSON)[3]
-- Result is '3'::JSON
SELECT ('{"a": 1, "b": 2, "c": 3, "d": 4}'::JSON)[3]
-- Result is '3'::JSON
From | Date | Subject | |
---|---|---|---|
Next Message | Takahiro Itagaki | 2010-05-14 02:24:03 | Re: pg_upgrade code questions |
Previous Message | Tom Lane | 2010-05-14 01:15:24 | Re: quoting and recovery.conf |