From: | balasubramanian c r <crbs(dot)siebel(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Querying a table with jaccard similarity with 1.6 million records take 12 seconds |
Date: | 2021-09-02 11:09:09 |
Message-ID: | CANnzXMPHf2r8ONRWodAMHskR86mB6z6ArmN+y636dn6qfvsL4g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Team
We have encountered a problem in our testing environment.
I have a scenario where I am running a similarity match for an address
I have created a table with following number of records
1603423
We are using pg_similarity extension in postgresql version is 13.
And I have created GIN index (since i am using pg_similarity) library
jaccard similarity method
when I run the Explain analyze
EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj
nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address
where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad
201017' order by qsim DESC;
QUERY PLAN
Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual
time=12101.194..12101.197 rows=6 loops=1)
Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity
up ghaziabad 201017'::text)) DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93)
(actual time=3516.233..12101.172 rows=6 loops=1)
Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up
ghaziabad 201017'::text)
Rows Removed by Index Recheck: 1039186
Heap Blocks: exact=58018
-> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02
rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up
ghaziabad 201017'::text)
Planning Time: 0.141 ms
Execution Time: 12101.245 ms
(11 rows)
it took 12 seconds
following are my pgconf file settings
shared buffer as 4GB
work_mem 256 MB
maintenence_work_mem 512MB
autovacuum_work_mem 20MB
My index definition is this "address_complete_address_idx1" gin
(complete_address gin_similarity_ops)
It is taking the index correctly. But why it took 12 seconds to process I
really don't understand.
Please help.
Thanks
C.R.Bala
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2021-09-02 12:14:28 | Re: No xmin in pg_database |
Previous Message | Tom Lane | 2021-09-02 10:41:43 | Re: No xmin in pg_database |