Re: Abnormal JSON query performance

From: 007reader <007reader(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Abnormal JSON query performance
Date: 2018-05-16 17:56:40
Message-ID: 5afc70db.1c69fb81.65fd5.c3d5@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thank you for all your recomendations. My conclusion is that at this point, postgres doesn's have a robust way of extracting specific keys from JSON column. In real applications, JSON documents are usually hierarchical and fairly complex. Adding fast data access to parts of the JSON would be a great improvement and keep postgres' leadership in unstructured data space.
-------- Original message --------From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> Date: 5/16/18 8:10 AM (GMT-08:00) To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, reader 1001 <007reader(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org> Subject: Re: Abnormal JSON query performance

2018-05-16 17:07 GMT+02:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
Dmitry Dolgov <9erthalion6(at)gmail(dot)com> writes:

>> On 16 May 2018 at 05:59, David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

>> On Tuesday, May 15, 2018, reader 1001 <007reader(at)gmail(dot)com> wrote:

>>> My question remains for hierarchical keys in a JSON document. If I have a

>>> document like below, I clearly can extract key1 using the described rowtype

>>> definition. How can I specify selected keys deeper in the document, e.g.

>>> key3 and key5?

>> I believe you would need a type for each subtree and apply the function

>> multiple times with the result of one feeding the next.

> Yes, you need to defined a type for each subtree, but as far as I can

> tell it's not necessary to apply the function multiple times,

> `jsonb_populate_record` can work with nested types, so it's enough

> just to have every new type included in the previous one.

FWIW, I really doubt that there's much performance win from going further

than the first-level keys.  I suspect most of the cost that the OP is

seeing comes from fetching the large JSONB document out of toast storage

multiple times.  Fetching it just in a single jsonb_populate_record()

call will fix that.  So I'd just return the top-level field(s) as jsonb

column(s) and use the normal -> or ->> operators to go further down.

The vague ideas that I've had about fixing this type of problem

automatically mostly center around detecting the need for duplicate

toast fetches and doing that just once.  For data types having "expanded"

forms, it's tempting to consider also expanding them during the fetch,

but that's less clearly a win.

Just note. If SQL/JSON will be implemented, then this discussion is useless, because JSON_TABLE function allows to read more values per one call.
Regards
Pavel 

                        regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-05-16 18:26:56 BUG #15202: Unexpected behavior with trigger fired on logical replicaion using pg_notify
Previous Message 007reader 2018-05-16 17:49:39 Re: Abnormal JSON query performance