From: | Sankar P <sankar(dot)curiosity(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | DISTINCT on jsonb fields and Indexes |
Date: | 2020-06-22 04:43:37 |
Message-ID: | CAMSEaH6ri7hH3-MywK+Vrb=RpkhzMufU2xXFdSKV9zubmniieg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have a table with the schema:
CREATE TABLE fluent (id BIGSERIAL, record JSONB);
Then I created a couple of indexes:
1) CREATE INDEX idx_records ON fluent USING GIN (record);
2) CREATE INDEX idx_btree_records ON fluent USING BTREE ((record ->>
'destinationServiceName'));
Now, if I run a query to look up the distinct values of the field
`destinationServiceName`, via:
select distinct record ->> 'destinationServiceName' from fluent;
This query takes a lot of time, and does full table scan. The query planning is:
# explain analyze select distinct record ->> 'destinationServiceName'
from fluent;
QUERY PLAN
Unique
(cost=1103803.97..1118803.97 rows=3000000 width=32) (actual
time=77282.528..78549.877 rows=10 loops=1)
-> Sort (cost=1103803.97..1111303.97 rows=3000000 width=32)
(actual time=77282.525..78046.992 rows=3000000 loops=1)
Sort Key: ((record ->> 'destinationServiceName'::text))
Sort Method: external merge Disk: 117456kB
-> Seq Scan on fluent (cost=0.00..637500.00 rows=3000000
width=32) (actual time=14.440..69545.867 rows=3000000 loops=1)
Planning Time: 0.187 ms
Execution Time: 78574.221 ms
I see that none of the indexes are used. I want to do a few
aggregations, like "what are the distinct pairs of
`destinationServiceName` and `sourceServiceName` etc. " in these
records. Now, is such a querying possible at all without doing full
table scans ? I get such kind of aggregation support in elasticsearch
+ kibana, without doing full-table scan (I believe so, but I do not
have data to back this claim) and I am trying to see if this is
possible with any other extra index creation in postgres.
Any suggestions ? Thanks.
--
Sankar P
http://psankar.blogspot.com
From | Date | Subject | |
---|---|---|---|
Next Message | raf | 2020-06-22 05:20:55 | Re: Hiding a GUC from SQL |
Previous Message | Thomas Munro | 2020-06-22 04:02:00 | Re: Definition of REPEATABLE READ |