Re: Why is this query not using GIN index?

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Aaron Lewis <the(dot)warl0ck(dot)1989(at)gmail(dot)com>
Cc: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why is this query not using GIN index?
Date: 2016-11-13 18:40:42
Message-ID: CAF4Au4xW0428ownCkAYZ46p_dS8ggDs5c=409T_Dkpbip1g7kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the(dot)warl0ck(dot)1989(at)gmail(dot)com>
wrote:

> Sigh, didn't notice that. Thanks for the heads up.
>
> It takes 500ms with 10m rows, could it be faster?
>

sure. Recheck with function call is pretty expensive, so I'd not recommend
to create functional index, just create separate column of type tsvector
(materialize to_tsvector) and create gin index on it. You should surprise.

> I've increased work_mem to 256MB
>
> test=# explain analyze select * from mytable where
> to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
> QUERY PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------
> ---------------
> Limit (cost=684.06..2949.42 rows=1000 width=83) (actual
> time=348.506..536.483 rows=1000 loops=1)
> -> Bitmap Heap Scan on mytable (cost=661.41..158917.22 rows=69859
> width=83) (actual time=345.354..536.199 rows=1010 loops=1)
> Recheck Cond: (to_tsvector('english'::regconfig, title) @@
> '''x264'''::tsquery)
> Rows Removed by Index Recheck: 12242
> Heap Blocks: exact=20 lossy=186
> -> Bitmap Index Scan on name_fts (cost=0.00..643.95
> rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
> loops=1)
> Index Cond: (to_tsvector('english'::regconfig, title)
> @@ '''x264'''::tsquery)
> Planning time: 0.144 ms
> Execution time: 537.212 ms
> (9 rows)
>
> On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
> <julien(dot)rouhaud(at)dalibo(dot)com> wrote:
> > On 13/11/2016 15:26, Aaron Lewis wrote:
> >> Hi Oleg,
> >>
> >> Can you elaborate on the title column? I don't get it.
> >>
> >
> >>>> create table mytable(hash char(40), title varchar(500));
> >>>> create index name_fts on mytable using gin(to_tsvector('english',
> >>>> 'title'));
> >
> > You created an index on the text 'title', not on the title column, so
> > the index is useless.
> >
> > Drop the existing index and create this one instead:
> >
> > create index name_fts on mytable using gin(to_tsvector('english',
> title));
> >
> >> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov(at)gmail(dot)com>
> wrote:
> >>>
> >>>
> >>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <
> the(dot)warl0ck(dot)1989(at)gmail(dot)com>
> >>> wrote:
> >>>>
> >>>> I have a simple table, and a gin index,
> >>>>
> >>>> create table mytable(hash char(40), title varchar(500));
> >>>> create index name_fts on mytable using gin(to_tsvector('english',
> >>>> 'title'));
> >>>
> >>>
> >>>
> >>> ^^^^^
> >>>
> >>>>
> >>>> create unique index md5_uniq_idx on mytable(hash);
> >>>>
> >>>> When I execute a query with tsquery, the GIN index was not in use:
> >>>>
> >>>> test=# explain analyze select * from mytable where
> >>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
> >>>> QUERY PLAN
> >>>>
> >>>> ------------------------------------------------------------
> --------------------------------------------------------
> >>>> Limit (cost=0.00..277.35 rows=10 width=83) (actual
> >>>> time=0.111..75.549 rows=10 loops=1)
> >>>> -> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
> >>>> (actual time=0.110..75.546 rows=10 loops=1)
> >>>> Filter: (to_tsvector('english'::regconfig, (title)::text) @@
> >>>> '''abc'' | ''def'''::tsquery)
> >>>> Rows Removed by Filter: 10221
> >>>> Planning time: 0.176 ms
> >>>> Execution time: 75.564 ms
> >>>> (6 rows)
> >>>>
> >>>> Any ideas?
> >>>>
> >
> > --
> > Julien Rouhaud
> > http://dalibo.com - http://dalibo.org
>
>
>
> --
> Best Regards,
> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-11-13 18:51:44 Re: Why is this query not using GIN index?
Previous Message John R Pierce 2016-11-13 17:27:57 Re: pg_dumpall: could not connect to database "template1": FATAL: