Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

From: Ninad Shah <nshah(dot)postgres(at)gmail(dot)com>
To: balasubramanian c r <crbs(dot)siebel(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds
Date: 2021-09-02 14:47:17
Message-ID: CAOFEiBeo46Og+eTs5vyxcj7Gtck1MOE-VOae13WOhcrnmSf3aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Bala,

Are your statistics updated? Also, have you used the gin
operator(gin_similarity_ops) correctly?

It is fetching just 6 records out of a million, hence, it should not go for
bitmap index scan. As bitmap index scan loads a complete index, and access
relevant pages from the table later by bitmap heap scan.

Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 16:39, balasubramanian c r <crbs(dot)siebel(at)gmail(dot)com>
wrote:

> Hi Team
>
> We have encountered a problem in our testing environment.
> I have a scenario where I am running a similarity match for an address
> I have created a table with following number of records
> 1603423
>
> We are using pg_similarity extension in postgresql version is 13.
>
> And I have created GIN index (since i am using pg_similarity) library
> jaccard similarity method
> when I run the Explain analyze
> EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj
> nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address
> where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad
> 201017' order by qsim DESC;
> QUERY PLAN
>
> Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual
> time=12101.194..12101.197 rows=6 loops=1)
>
> Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity
> up ghaziabad 201017'::text)) DESC
> Sort Method: quicksort Memory: 25kB
> -> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93)
> (actual time=3516.233..12101.172 rows=6 loops=1)
> Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity
> up ghaziabad 201017'::text)
> Rows Removed by Index Recheck: 1039186
> Heap Blocks: exact=58018
> -> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02
> rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
> Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up
> ghaziabad 201017'::text)
> Planning Time: 0.141 ms
> Execution Time: 12101.245 ms
> (11 rows)
> it took 12 seconds
> following are my pgconf file settings
> shared buffer as 4GB
> work_mem 256 MB
> maintenence_work_mem 512MB
> autovacuum_work_mem 20MB
> My index definition is this "address_complete_address_idx1" gin
> (complete_address gin_similarity_ops)
>
> It is taking the index correctly. But why it took 12 seconds to process I
> really don't understand.
>
> Please help.
>
> Thanks
> C.R.Bala
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ninad Shah 2021-09-02 14:49:46 Re: calling store procedure / insert statement never complete
Previous Message M Tarkeshwar Rao 2021-09-02 14:35:20 memory consumption of memory for postgres db container