Re: Inconsistent query times and spiky CPU with GIN tsvector search

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Scott Rankin <srankin(at)motus(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inconsistent query times and spiky CPU with GIN tsvector search
Date: 2018-09-04 19:15:19
Message-ID: 2633cf81b451451ebcb4bb213a256ad5975388ca.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Scott Rankin wrote:
> We are running postgresql 9.4 and we have a table where we do some
> full-text searching using a GIN index on a tsvector column:
>
> CREATE INDEX location_search_tsvector_idx
> ON public.location_search USING gin
> (search_field_tsvector)
> TABLESPACE pg_default;
>
> This setup has been running very well, but as our load is getting heavier,
> the performance seems to be getting much more inconsistent.
> Our searches are run on a dedicated read replica, so this server is only
> doing queries against this one table. IO is very low, indicating to me
> that the data is all in memory. However, we're getting some queries taking
> upwards of 15-20 seconds, while the average is closer to 1 second.
>
> A sample query that's running slowly is
>
> explain (analyze, buffers)
> SELECT ls.location AS locationId FROM location_search ls
> WHERE ls.client = 1363
> AND ls.favorite = TRUE
> AND search_field_tsvector @@ to_tsquery('CA-94:* &E &San:*')
> LIMIT 4;
>
> And the explain analyze is:
>
> Limit (cost=39865.85..39877.29 rows=1 width=8) (actual time=4471.120..4471.120 rows=0 loops=1)
> Buffers: shared hit=25613
> -> Bitmap Heap Scan on location_search ls (cost=39865.85..39877.29 rows=1 width=8) (actual time=4471.117..4471.117 rows=0 loops=1)
> Recheck Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E &San:*'::text))
> Filter: (favorite AND (client = 1363))
> Rows Removed by Filter: 74
> Heap Blocks: exact=84
> Buffers: shared hit=25613
> -> Bitmap Index Scan on location_search_tsvector_idx (cost=0.00..39865.85 rows=6 width=0) (actual time=4470.895..4470.895 rows=84 loops=1)
> Index Cond: (search_field_tsvector @@ to_tsquery('CA-94:* &E &San:*'::text))
> Buffers: shared hit=25529
> Planning time: 0.335 ms
> Execution time: 4487.224 ms

Not sure, but maybe you are suffering from bad performance because of a
long "GIN pending list".

If yes, then the following can help:

ALTER INDEX location_search_tsvector_idx SET (gin_pending_list_limit = 512);

Or you can disable the feature altogether:

ALTER INDEX location_search_tsvector_idx SET (fastupdate = off);

Then clean the pending list with

SELECT gin_clean_pending_list('location_search_tsvector_idx'::regclass);

Disabling the pending list will slow down data modification, but should
keep the SELECT performance stable.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2018-09-05 00:35:19 Re: Query is slow when run for first time; subsequent execution is fast
Previous Message Scott Rankin 2018-09-04 18:09:10 Inconsistent query times and spiky CPU with GIN tsvector search