Quick Links

pg_trgm for address search

From:	Sumit Raja <sumitraja(at)gmail(dot)com>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	pg_trgm for address search
Date:	2021-02-04 05:24:59
Message-ID:	CAD4nrSd4RF3rvCNJZ5rBttKdXAHMLOsKo4PUcohFUk8oZDf7ig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hello,

I am trying to get a functioning postgres address search capability for
Australian addresses using tsearch or pg_trgm. pg_trgm is actually better
suited in this case as it allows progressive entry e.g. ' 1, 20 Kle' gives
a good set of results across Kelm Avenue, Kleins Av etc.

Good performance with tsearch (<1s response times) for non progressive
matches but the pg_trgm performance varies from 3.5 to 15 seconds.

Table is very simple:

CREATE TABLE address_search.tsearch_address_detail (
address_detail_pid character varying(15) NOT NULL,
address_state_abbreviation character varying(3) NOT NULL,
address_concat text not null,
address_concat_ts tsvector
);

CREATE INDEX idx_places_trgm_gin_addr ON
address_search.tsearch_address_detail USING gin(address_concat
gin_trgm_ops);

CREATE INDEX idx_places_trgm_gist_addr ON
address_search.tsearch_address_detail USING gist(address_concat
gist_trgm_ops);

CREATE INDEX idx_places_ts_gin_addr ON
address_search.tsearch_address_detail USING GIN (address_concat_ts);

Actual data looks like the below:

address_detail_pid | address_state_abbreviation |
address_concat |
address_concat_ts
--------------------+----------------------------+---------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------
GAWA_163274127 | WA | Unit 1, 20 Klem Avenue,
Salter Point, WA 6152 | '1':2 '20':3
'6152':9 'avenu':5 'klem':4 'point':7 'salter':6 'unit':1 'wa':8
GAWA_163274129 | WA | Unit 2, 20 Klem Avenue,
Salter Point, WA 6152 | '2':2 '20':3
'6152':9 'avenu':5 'klem':4 'point':7 'salter':6 'unit':1 'wa':8
GAWA_163274130 | WA | Unit 2, 3 Klem Avenue,
Salter Point, WA 6152 | '2':2 '3':3
'6152':9 'avenu':5 'klem':4 'point':7 'salter':6 'unit':1 'wa':8
GAWA_163274255 | WA | 11 Lancefield Street,
Laverton, WA 6440 | '11':1 '6440':6
'lancefield':2 'laverton':4 'street':3 'wa':5
GAWA_163274256 | WA | 13 Lancefield Street,
Laverton, WA 6440 | '13':1 '6440':6
'lancefield':2 'laverton':4 'street':3 'wa':5

and query is:

SELECT address_detail_pid, address_concat, word_similarity('1, 20 kle',
address_concat) AS sml
FROM address_search.tsearch_address_detail
WHERE '1, 20 kle' <% address_concat
ORDER BY sml DESC limit 10;

The explain (analyze, buffer) is (https://explain.depesz.com/s/tvZ9)
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=52551.77..52551.79 rows=10 width=65) (actual
time=3119.791..3119.793 rows=10 loops=1)
Buffers: shared hit=6432 read=4564
-> Sort (cost=52551.77..52590.50 rows=15492 width=65) (actual
time=3119.788..3119.789 rows=10 loops=1)
Sort Key: (word_similarity('1, 20 kle'::text, address_concat)) DESC
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=6432 read=4564
-> Bitmap Heap Scan on tsearch_address_detail
(cost=252.06..52216.99 rows=15492 width=65) (actual time=547.976..3119.067
rows=394 loops=1)
Recheck Cond: ('1, 20 kle'::text <% address_concat)
Rows Removed by Index Recheck: 3791
Heap Blocks: exact=3991
Buffers: shared hit=6429 read=4564
-> Bitmap Index Scan on idx_places_trgm_gin_addr
(cost=0.00..248.19 rows=15492 width=0) (actual time=547.380..547.380
rows=4185 loops=1)
Index Cond: ('1, 20 kle'::text <% address_concat)
Buffers: shared hit=3991 read=3011
Planning Time: 44.701 ms
Execution Time: 3120.052 ms
(16 rows)

Table size is 3026MB and GIN index size is 293 MB.

I've increased shared_buffers to 800MB, work_mem=1500MB,
effective_cache_size=2GB.

Are there any optimisations I can make or should I be building the data set
differently for better searching by pg_trgm? Or using a combination of
tsearch and pg_trgm?

Thanks

Sumit

Browse pgsql-general by date

	From	Date	Subject
Next Message	Niels Jespersen	2021-02-04 05:31:12	SV: table returning function for each row in other resultset
Previous Message	Niels Jespersen	2021-02-04 05:03:57	SV: SV: Npgsql and the Connection Service File