Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

From: Ninad Shah <nshah(dot)postgres(at)gmail(dot)com>
To: balasubramanian c r <crbs(dot)siebel(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds
Date: 2021-09-02 19:42:16
Message-ID: CAOFEiBdyfFo9ZYJCG8JXbwPWDX8vQWnFh92=GvHEtcE=Oj_3wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Try the pg_tgrm extension. It is a rich set of operators.

Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 23:39, balasubramanian c r <crbs(dot)siebel(at)gmail(dot)com>
wrote:

> HI Ninad
>
> Thanks for your reply.
> If bitmap index should not be used. Do i need to disable it for the time
> being and carry out the test.
>
> The documentation in pg_similarity shows that index can be created on text
> column using gin with gin_similarity_ops.
> The same way the index is created like
> CREATE INDEX on address using GIN(complete_address gin_similarity_ops);
>
> AFAIK I have not seen any other operators other than gin in the
> pg_smilarity extension.
>
> Thanks
> C.R.Bala
>
> On Thu, Sep 2, 2021 at 8:17 PM Ninad Shah <nshah(dot)postgres(at)gmail(dot)com>
> wrote:
>
>> Hi Bala,
>>
>> Are your statistics updated? Also, have you used the gin
>> operator(gin_similarity_ops) correctly?
>>
>> It is fetching just 6 records out of a million, hence, it should not go
>> for bitmap index scan. As bitmap index scan loads a complete index, and
>> access relevant pages from the table later by bitmap heap scan.
>>
>>
>> Regards,
>> Ninad Shah
>>
>>
>> On Thu, 2 Sept 2021 at 16:39, balasubramanian c r <crbs(dot)siebel(at)gmail(dot)com>
>> wrote:
>>
>>> Hi Team
>>>
>>> We have encountered a problem in our testing environment.
>>> I have a scenario where I am running a similarity match for an address
>>> I have created a table with following number of records
>>> 1603423
>>>
>>> We are using pg_similarity extension in postgresql version is 13.
>>>
>>> And I have created GIN index (since i am using pg_similarity) library
>>> jaccard similarity method
>>> when I run the Explain analyze
>>> EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj
>>> nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address
>>> where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad
>>> 201017' order by qsim DESC;
>>> QUERY PLAN
>>>
>>> Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual
>>> time=12101.194..12101.197 rows=6 loops=1)
>>>
>>> Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara
>>> integrity up ghaziabad 201017'::text)) DESC
>>> Sort Method: quicksort Memory: 25kB
>>> -> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93)
>>> (actual time=3516.233..12101.172 rows=6 loops=1)
>>> Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity
>>> up ghaziabad 201017'::text)
>>> Rows Removed by Index Recheck: 1039186
>>> Heap Blocks: exact=58018
>>> -> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02
>>> rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
>>> Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity
>>> up ghaziabad 201017'::text)
>>> Planning Time: 0.141 ms
>>> Execution Time: 12101.245 ms
>>> (11 rows)
>>> it took 12 seconds
>>> following are my pgconf file settings
>>> shared buffer as 4GB
>>> work_mem 256 MB
>>> maintenence_work_mem 512MB
>>> autovacuum_work_mem 20MB
>>> My index definition is this "address_complete_address_idx1" gin
>>> (complete_address gin_similarity_ops)
>>>
>>> It is taking the index correctly. But why it took 12 seconds to process
>>> I really don't understand.
>>>
>>> Please help.
>>>
>>> Thanks
>>> C.R.Bala
>>>
>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ninad Shah 2021-09-02 19:43:05 Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds
Previous Message Ninad Shah 2021-09-02 19:38:40 Re: memory consumption of memory for postgres db container