DISTINCT on jsonb fields and Indexes

From: Sankar P <sankar(dot)curiosity(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: DISTINCT on jsonb fields and Indexes
Date: 2020-06-22 04:43:37
Message-ID: CAMSEaH6ri7hH3-MywK+Vrb=RpkhzMufU2xXFdSKV9zubmniieg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a table with the schema:

CREATE TABLE fluent (id BIGSERIAL, record JSONB);

Then I created a couple of indexes:
1) CREATE INDEX idx_records ON fluent USING GIN (record);
2) CREATE INDEX idx_btree_records ON fluent USING BTREE ((record ->>
'destinationServiceName'));

Now, if I run a query to look up the distinct values of the field
`destinationServiceName`, via:

select distinct record ->> 'destinationServiceName' from fluent;

This query takes a lot of time, and does full table scan. The query planning is:

# explain analyze select distinct record ->> 'destinationServiceName'
from fluent;
QUERY PLAN
Unique
(cost=1103803.97..1118803.97 rows=3000000 width=32) (actual
time=77282.528..78549.877 rows=10 loops=1)
-> Sort (cost=1103803.97..1111303.97 rows=3000000 width=32)
(actual time=77282.525..78046.992 rows=3000000 loops=1)
Sort Key: ((record ->> 'destinationServiceName'::text))
Sort Method: external merge Disk: 117456kB
-> Seq Scan on fluent (cost=0.00..637500.00 rows=3000000
width=32) (actual time=14.440..69545.867 rows=3000000 loops=1)
Planning Time: 0.187 ms
Execution Time: 78574.221 ms

I see that none of the indexes are used. I want to do a few
aggregations, like "what are the distinct pairs of
`destinationServiceName` and `sourceServiceName` etc. " in these
records. Now, is such a querying possible at all without doing full
table scans ? I get such kind of aggregation support in elasticsearch
+ kibana, without doing full-table scan (I believe so, but I do not
have data to back this claim) and I am trying to see if this is
possible with any other extra index creation in postgres.

Any suggestions ? Thanks.

--
Sankar P
http://psankar.blogspot.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message raf 2020-06-22 05:20:55 Re: Hiding a GUC from SQL
Previous Message Thomas Munro 2020-06-22 04:02:00 Re: Definition of REPEATABLE READ