Re: How to speed up pg_trgm / gin index scan

From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Christian Ramseyer <rc(at)networkz(dot)ch>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to speed up pg_trgm / gin index scan
Date: 2015-06-22 18:32:21
Message-ID: CAJKUy5hOcKZe6dtZPuyt8+wCBUOyRu-68uLTZxUuUs_1wnk09Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 22, 2015 at 6:51 AM, Christian Ramseyer <rc(at)networkz(dot)ch> wrote:
>
> DM=# \d+ logs_01
>
> Column | Type |
> --------------+-----------------------------+
> host | character varying(255) |
> facility | character varying(10) |
> priority | character varying(10) |
> tag | character varying(255) |
> log_date | timestamp without time zone |
> program | character varying(255) |
> msg | text |
> seq | bigint |
>
> Indexes:
> "logs_01_pkey" PRIMARY KEY, btree (seq)
> "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops)
> "logs_01_date_index" btree (log_date)
> "tridx_logs_01_msg" gin (msg gin_trgm_ops)
>
>
> DM=# select count(*) from logs_01;
> count
> ----------
> 83052864
>
>
> I'd like to provide a fast "like %x%" search on the msg column, hence I added a trigram based gin index on it. It is around 60 GB on the 35 GB table:
>
> DM=# select count(*) from logs_01;
> count
> ----------
> 83052864
>
>
> DM=# \dt+ logs_01
> List of relations
> Schema | Name | Type | Owner | Size | Description
> --------+---------+-------+----------+-------+-------------
> public | logs_01 | table | postgres | 35 GB |
>
> DM=# \di+ tridx_logs_01_msg
> List of relations
> Schema | Name | Type | Owner | Table | Size | Description
> --------+-------------------+-------+----------+---------+-------+-------------
> public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB |
>
>

What version of postgres is this? GIN indexes improved a lot in 9.4,
they use less than half the space and have doubled the speed (on
average).

Now, whatever version you have; a GIN index has two data structures,
the main one in which the index entries are stored as key-value pairs
(please someone correct my description of the situation) and a pending
list, which is a temporary unsorted list of pending entries in which
all the newly inserted tuples arrive until a VACUUM (or until the
pending list grows upto work_mem) moves that list into the main
structure.

That happens to avoid the penalty of inserting new rows in the main
structure which could be expensive.
But while the pending list grows the speed of the index decreases. And
because you have work_mem in 16Gb your pending list is possibly
growing without control.

if you have 9.3 or superior you can know how big is that pending list
installing pgstattuple.

CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatginindex('tridx_logs_01_msg');

NOTE: remember that pending_pages is expressed in 8kb-pages

if that is the problem or if you are in <= 9.2 then try VACUUM the table

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Suresh Raja 2015-06-22 21:09:24 Re: extracting PII data and transforming it across table.
Previous Message Christian Ramseyer 2015-06-22 17:39:52 Re: How to speed up pg_trgm / gin index scan