jsonpath duplication result

From: Charles Paperman <charles(dot)paperman(at)inria(dot)fr>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Cc: Filip Murlak <fmurlak(at)gmail(dot)com>, matgienieczko(at)gmail(dot)com
Subject: jsonpath duplication result
Date: 2021-09-02 13:06:20
Message-ID: YTDMTNTZcFACww2V@paperman.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I am looking at the documentation and how jsonpath works and I fall on the following example:

(On PostgreSQL 12.4, package for Debian 12.4-3):

SELECT jsonb_path_query(j, '$.**.a.**.b') FROM (SELECT '{"a":{"a":{"b":{"c":3}}}'::jsonb as j) as T;

The result:

jsonb_path_query
------------------
{"c": 3}
{"c": 3}
(2 lignes)

From what in understand, it matches twice the last `b` with every `a` before. For instance, adding one more 'a' on the tree:

SELECT jsonb_path_query(j, '$.**.a.**.b') FROM (SELECT '{"a":{"a":{"a":{"b":{"c":3}}}}}'::jsonb as j) as T;
jsonb_path_query
------------------
{"c": 3}
{"c": 3}
{"c": 3}

This could be a fine semantic for jsonpath query (one could call that the Bag semantic),
however it implies that in some situation the number of output duplicated can become huge.
Actually in the worst case, it could be exponential in the depth of the tree.

On more example (for the fun):

SELECT Count(*) FROM (SELECT jsonb_path_query(j, '$.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.b') FROM (SELECT '{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"b":{"c":3}}}}}}}}}}}}}}}}}}}}}}}}}}}}}'::jsonb as j) as T) as T2;

Takes 24s a returns 20058300 rows.

This semantic for jsonpath is not so classical (other implementation choice to return only once each selected element, in the SET semantic spirit)
and differs from the semantic of xpath, as shown here, which follows the SET semantic:

SELECT xpath('//a//a//b', '<a><a><a><b>1</b></a></a></a>');

Returns:

xpath
------------
{<b>1</b>}

We can of course get the appropriate semantic by adding DISTINCT, but the initial list of rows is computed nevertheless (a O(n) algorithm exists, so it is avoidable).

I also wonder if it is problematic to have simple small query like that that can be design to make the database work endlessly?

Best,

Charles Paperman
Maitre de Conférence à l'université de Lille

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2021-09-02 13:13:39 Re: datfrozenxid not dropping after vacuum
Previous Message Matthew Tice 2021-09-02 12:27:43 Re: datfrozenxid not dropping after vacuum