Performance issue on GIN index with gin_trgm_ops index column

From: Lars Vonk <lars(dot)vonk(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Performance issue on GIN index with gin_trgm_ops index column
Date: 2022-07-27 15:39:21
Message-ID: CAMX1ThhJswV7Mi19qUArLwHQqWoTsXgqMLuGnHoe6kAdbZf+hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

We are investigating a performance issue with searching on a GIN with
gin_trgm_ops indexed column. This specific (organization_id,aggregate_type)
has in total 19K records and the (organization_id) has in total 30K
records. The search record table has in total 38M records.

The table and index are defined as follows:

Column | Type | Collation | Nullable |
Default

-----------------+-------------------+-----------+----------+---------------------------------------------------------------------
id | bigint | | not null |
nextval('search_records_id_seq'::regclass)
organization_id | uuid | | not null |
aggregate_id | uuid | | not null |
aggregate_type | character varying | | not null |
document | text | | |
Indexes:
"search_records_pkey" PRIMARY KEY, btree (id)
"search_records_keys" UNIQUE, btree (organization_id, aggregate_id,
aggregate_type)
"search_records_btree_gin" gin ((organization_id::character varying),
aggregate_type, document gin_trgm_ops)

The query we execute is:

select aggregate_id from search_records where organization_id::varchar =
'975097c5-e760-4603-9236-fcf2e8580a7c' and aggregate_type = 'FooRecord' and
document ilike '%user(dot)name(at)gmail(dot)com%';

Resulting in the following plan:

Bitmap Heap Scan on search_records (cost=2184.00..2188.02 rows=1
width=104) (actual time=4332.007..4332.008 rows=1 loops=1)
Recheck Cond: ((((organization_id)::character varying)::text =
'975097c5-e760-4603-9236-fcf2e8580a7c'::text) AND ((aggregate_type)::text =
'FooRecord'::text) AND (document ~~* '%user(dot)name(at)gmail(dot)com%'::text))
Heap Blocks: exact=1
Buffers: shared hit=23920 read=9752
I/O Timings: read=4017.360
-> Bitmap Index Scan on search_records_btree_gin (cost=0.00..2184.00
rows=1 width=0) (actual time=4331.987..4331.987 rows=1 loops=1)
Index Cond: ((((organization_id)::character varying)::text =
'975097c5-e760-4603-9236-fcf2e8580a7c'::text) AND ((aggregate_type)::text =
'FooRecord'::text) AND (document ~~* '%user(dot)name(at)gmail(dot)com%'::text))
Buffers: shared hit=23920 read=9751
I/O Timings: read=4017.355
Planning Time: 0.268 ms
Execution Time: 4332.030 ms
(11 rows)

We are running on Postgres RDS with engine version 12.8 with 32GB memory
and 8GB shared_buffer. We have 442GB of 2000GB diskspace left.

Sometimes we also have queries for this particular customer which take more
than 20 seconds. The content of the document in those cases are similar to:

User Name Kees postgresstreet Amsterdam 1000 AA user(dot)name(at)gmail(dot)com 1234

Are we doing something wrong? I find the I/O timings quite high, does this
mean that it took 4000MS to read the 9752 blocks from the disk?

Any other tips and or suggestions are welcome.

Kind regards,
Lars Vonk

Browse pgsql-general by date

  From Date Subject
Next Message Michael Nolan 2022-07-27 15:48:41 Re: Feature request: psql --idle
Previous Message Julien Rouhaud 2022-07-27 14:07:02 Re: Feature request: psql --idle