From: | Shaheed Haque <shaheedhaque(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Using JSONB with nested key-value structures, and skipping/wildcarding the key |
Date: | 2020-06-01 18:18:10 |
Message-ID: | CAHAc2jdUVXz+G6h82vMBVVsvr98a-eNsd7yfwz3Muva5UaajQg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB
column called 'snapshot'. In Python terms, each row's 'snapshot' looks like
this:
======================
snapshot = {
'pay_definition' : {
'1234': {..., 'name': 'foo', ...},
'99': {..., 'name': 'bar', ...},
}
======================
I'd like to find all unique values of 'name' in all rows of MyModel. I have
this working using native JSON functions from the ORM like this:
=====================
class PayDef(Func):
function='to_jsonb'
template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef',
flat=True)
=====================
So, skipping the ordering/distinct/ORM parts, the core looks like this:
to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name')
My question is if this the best way to solve this problem? The way my
current logic works, reading from inside out is, I think:
1. Pass in the 'snapshot'.
2. Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this
key.
3. To skip the unknown numeric keys, "jsonb_each()" turns each key,
value pair into an inner row like ['1234', {...}].
4. To get to the value column of the inner row "row_to_json()->'value'".
5. To get the name field's value "->'name'".
6. A final call to "to_jsonb" in the PayDefs class. This bit is clearly
Django-specific.
For example, I think the pair of calls row_to_json(jsonb_each()) is needed
because there is no jsonb_object_values() to complement
jsonb_object_keys(). Likewise, since all I care about is the string value
of 'name', is there a way to get rid of the PayDefs class, and its
invocation of to_jsonb (this is probably Django-specific)?
To provide context on what "better" might be:
- Snapshot JSONs might easily be 20MB in size.
- Each 'pay_definition' is probablyonly about 1kB in size, and there
might be 50 of them in a snapshot.
- There might be 1000 MyModel instances in a given query.
- I'm using PostgreSQL 12
so my concern is not have the database server or Django perform extraneous
work converting between strings and JSON for example.
Thanks, Shaheed
P.S. I posted a Django-centric version of this to the relevant mailing list
but got no replies; nevertheless, apologies for the cross post.
From | Date | Subject | |
---|---|---|---|
Next Message | Jeremy Schneider | 2020-06-01 22:33:46 | Re: Pglogical 2.3.0 in AWS RDS 12.2 |
Previous Message | Karsten Hilbert | 2020-06-01 18:08:18 | Re: Oracle vs. PostgreSQL - a comment |