Re: query not using GIN index

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "Guo, Yun" <YGuo(at)cvent(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query not using GIN index
Date: 2015-08-24 17:04:42
Message-ID: CAMkU=1yiQBCJAXSVoYCPGZ+uqy6tpJv-taw4iGdUPX6kxpshpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Aug 24, 2015 at 8:18 AM, Guo, Yun <YGuo(at)cvent(dot)com> wrote:

>
>
> From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> Date: Friday, August 21, 2015 at 10:44 PM
> To: Yun <yguo(at)cvent(dot)com>
> Subject: Re: [PERFORM] query not using GIN index
>
> On Fri, Aug 21, 2015 at 6:55 PM, Guo, Yun <YGuo(at)cvent(dot)com> wrote:
>
>> Hi,
>>
>> We have a query on a column with GIN index, but query plan chooses not
>> using the index but do an seq scan whichi is must slower
>>
>> CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING
>> gin (access_tokens);
>>
>> explain analyze SELECT "access_grants".* FROM "access_grants" WHERE
>> (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------------------------
>> Limit (cost=0.00..7.46 rows=1 width=157) (actual time=260.376..260.377
>> rows=1 loops=1)
>> -> Seq Scan on access_grants (cost=0.00..29718.03 rows=3985
>> width=157) (actual time=260.373..260.373 rows=1 loops=1)
>> Filter: (access_tokens @>
>> '{124e5a1f9de325fc176a7c89152ac734}'::text[])
>> Rows Removed by Filter: 796818
>> Total runtime: 260.408 ms
>>
>
>
> What version are you running? What are your non-default configuration
> settings (particularly for the *_cost parameters)?
>
> select name,setting from pg_settings where name like '%cost';
> name | setting
> ----------------------+---------
> cpu_index_tuple_cost | 0.005
> cpu_operator_cost | 0.0025
> cpu_tuple_cost | 0.01
> random_page_cost | 4
> seq_page_cost | 1
>

OK, thanks. I had overlooked the "LIMIT" in the first plan you posted, and
so thought you must have some pretty weird settings. But noticing the
LIMIT, it makes more sense with the normal settings, like the ones you show.

>
> Can you turn track_io_timing on and then report a explain (analyze,
> buffers) of the same query?
>
> I didn’t try this as our prod instance is on AWS and setting this would
> require a reboot.
>

OK, but you can still do an "explain (analyze,buffers)". It is less useful
than with track_io_timing on, but it is still more useful than just
"explain analyze".

>
> Then do a "set enable_seqscan=off" and repeat.
>
> This is the life saver! After applying this, it’s able to use the index.
> But should we consider it as the permanent solution?
>

No, probably not a permanent solution. Or at least, I only do things like
that in production as a last resort. I suggested doing that so you can
force it to use the index and so see what the explain (analyze,buffers)
look like when it does use the index. Sorry for not being more clear.

The seq scan thinks it is going to find a matching row pretty early in the
scan and can stop at the first one, but based on "Rows Removed by Filter:
796818" it isn't actually finding a match until the end. There probably
isn't much you can do about this, other than not using a LIMIT.

The reason it thinks it will find a row soon is that it thinks 0.5% of the
rows meet your criteria. That is default selectivity estimate it uses when
it has nothing better to use. Raising the statistics target on the column
might help. But I doubt it, because access tokens are probably nearly
unique, and so even the highest possible setting for statistics target is
not going get it to record MCE statistics. See
https://commitfest.postgresql.org/6/323/ for a possible solution, but any
fix for that won't be released to production for a long time.

If your gin index has a large pending list, that will make the index scan
look very expensive. vacuuming the table will clear that up. Setting
fastupdate off for the index will prevent it growing again. Based on your
description of most lists having 0 or 1 element in them, and my assumption
that a table named "access_grants" isn't getting updated hundreds of times
a second, I don't think fast_update being off is going to cause any
problems at all.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Johann Spies 2015-08-25 07:22:45 Long running query: How to monitor the progress
Previous Message Guo, Yun 2015-08-24 15:08:29 Re: query not using GIN index