From: | Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> |
---|---|
To: | "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: jsonpath |
Date: | 2019-03-17 22:57:36 |
Message-ID: | aae2a342-28ff-1801-1e8e-c80203001078@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17.03.2019 21:29, Jonathan S. Katz wrote:
> On 3/17/19 1:14 PM, Alexander Korotkov wrote:
>> On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
>>> On 3/17/19 1:02 PM, Alexander Korotkov wrote:
>>>> Thank you for the explanation. Is it jsonb_ops or jsonb_path_ops?
>>> I just used "USING gin(col)" so jsonb_ops.
>> I see. So, jsonb_ops extracts from this query only existence of
>> .length key. And I can bet it exists in all (or almost all) the
>> documents. Thus, optimizer thinks index might be useful, while it's
>> useless. There is not much can be done while we don't have statistics
>> for jsonb (and access to it from GIN extract_query). So, for now we
>> can just refuse from extracting only keys from jsonpath in jsonb_ops.
>> But I think it would be better to just document this issue. In future
>> we should improve that with statistics.
> That seems to make sense, especially given how I've typically stored
> JSON documents in PostgreSQL. It sounds like this particular problem
> would be solved appropriately with JSONB statistics.
GIN jsonb_ops extracts from query
data @? '$.length ? (@ < 150)'
the same GIN entries as from queries
data @? '$.length'
data ? 'length'
If you don't want to extract entries from unsupported expressions, you can try
to use another jsonpath operator @@. Queries will also look like a bit simpler:
data @@ '$.length < 150'
data @@ '$.content like_regex "^Start"'
data @@ '$.content like_regex "risk" flag "i"'
All this queries emit no GIN entries. But note that
data @@ '$ ? (@.content == "foo").length < 150'
emits the same entries as
data @@ '$.content == "foo"'
We already have a POC implementation of jsonb statistics that was written
2 years ago. I rebased it onto the current master yesterday. If it is
interesting, you can find it on my GitHub [1]. But note, that there is
no support for jsonpath operators yet, only boolean EXISTS ?, ?|, ?&, and
CONTAINS @> operators are supported. Also there is no docs, and it works
slowly (a more effective storage method for statistics of individual JSON
paths is needed).
Also there is ability to calculate derived statistics of expressions like
js -> 'x' -> 0 -> 'y'
js #> '{x,0,y}'
using jsonb statistics for columns "js". So the selectivity of expressions
js -> 'x' -> 0 -> 'y' = '123'
js #> '{x,0,y}' >= '123'
also can be estimated (but these expressions can't be used by index on "js").
This topic deserves a separate discussion. I hope, we will start the
corresponding thread for PG13 after we find a more effective way of jsonb
statistics storing.
[1] https://github.com/glukhovn/postgres/tree/jsonb_stats
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2019-03-17 23:19:00 | Re: insensitive collations |
Previous Message | Tom Lane | 2019-03-17 22:41:41 | Re: Rare SSL failures on eelpout |