From: | Christian Ramseyer <rc(at)networkz(dot)ch> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | How to speed up pg_trgm / gin index scan |
Date: | 2015-06-22 11:51:57 |
Message-ID: | 5587F6DD.6000307@networkz.ch |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi
I have a pretty large table with syslog messages.
It is already partitioned by month, and for a single month I have e.g.
DM=# \d+ logs_01
Column | Type |
--------------+-----------------------------+
host | character varying(255) |
facility | character varying(10) |
priority | character varying(10) |
tag | character varying(255) |
log_date | timestamp without time zone |
program | character varying(255) |
msg | text |
seq | bigint |
Indexes:
"logs_01_pkey" PRIMARY KEY, btree (seq)
"idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops)
"logs_01_date_index" btree (log_date)
"tridx_logs_01_msg" gin (msg gin_trgm_ops)
DM=# select count(*) from logs_01;
count
----------
83052864
I'd like to provide a fast "like %x%" search on the msg column, hence I added a trigram based gin index on it. It is around 60 GB on the 35 GB table:
DM=# select count(*) from logs_01;
count
----------
83052864
DM=# \dt+ logs_01
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+----------+-------+-------------
public | logs_01 | table | postgres | 35 GB |
DM=# \di+ tridx_logs_01_msg
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------------+-------+----------+---------+-------+-------------
public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB |
A typical query on this table looks like this:
explain analyze
select log_date, host, msg
from logs_01 as log where log.msg like '%192.23.33.177%'
and log.log_date >= '2015-1-18 1:45:24'
and log.log_date <= '2015-1-19 1:45:24'
order by log_date asc offset 200 limit 50;
It yields a promising explain that shows that the index is used, but unfortunately the bitmap index scan on the GIN index takes quite long (40 seconds)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=34510.06..34510.06 rows=1 width=195) (actual time=42971.002..42971.015 rows=50 loops=1)
-> Sort (cost=34509.75..34510.06 rows=124 width=195) (actual time=42970.960..42970.990 rows=250 loops=1)
Sort Key: log_date
Sort Method: top-N heapsort Memory: 152kB
-> Bitmap Heap Scan on logs_01 log (cost=34009.21..34505.44 rows=124 width=195) (actual time=42963.969..42969.725 rows=2472 loops=1)
Recheck Cond: ((msg ~~ '%192.23.33.177%'::text) AND (log_date >= '2015-01-18 01:45:24'::timestamp without time zone) AND (log_date <= '2015-01-19 01:45:24'::timestamp without time zone))
-> BitmapAnd (cost=34009.21..34009.21 rows=124 width=0) (actual time=42962.562..42962.562 rows=0 loops=1)
-> Bitmap Index Scan on tridx_logs_01_msg (cost=0.00..6992.15 rows=8020 width=0) (actual time=42731.145..42731.145 rows=168489 loops=1)
Index Cond: (msg ~~ '%192.23.33.177%'::text)
-> Bitmap Index Scan on logs_01_date_index (cost=0.00..27016.75 rows=1287939 width=0) (actual time=180.055..180.055 rows=1173048 loops=1)
Index Cond: ((log_date >= '2015-01-18 01:45:24'::timestamp without time zone) AND (log_date <= '2015-01-19 01:45:24'::timestamp without time zone))
Total runtime: 42971.137 ms
(also on http://explain.depesz.com/s/KpaB)
Any good ideas on how I could speed this up a bit?
I have already tried to throw quite a bunch of memory at the problem:
shared_buffers = 64GB
work_mem = 16GB
but it didn't improve between this and the 32GB shared/ 2GB work GB I had before.
This is on Postgres 9.1.15 on Linux.
Thanks
Christian
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Ramsey | 2015-06-22 13:09:23 | Less is More |
Previous Message | Piotr Gackiewicz | 2015-06-22 09:06:39 | Re: pg_dump 8.4.9 failing after upgrade to openssl-1.0.1e-30.el6_6.11.x86_64 on redhat linux |