Re: query not using GIN index

From: "Guo, Yun" <YGuo(at)cvent(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query not using GIN index
Date: 2015-08-24 14:45:16
Message-ID: D20084E2.430E3%yguo@cvent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Tom,
Thanks for you valuable input. You¹re right, the plan was coming from
explain analyze SELECT "access_grants".* FROM "access_grants² WHERE
(access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) limit 1;
We tried removing "limit 1², which did give us the benefit of using index
for sometime. However, after a while, it went back to the old behavior of
ignoring the index for " SELECT "access_grants".* FROM "access_grants²
WHERE (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;² .
We had to turn off sequential scans (enable_seqscan) to force it use the
index. But I¹m not sure this should be the permanent fix.
The access_grants table has 797415 rows and the schema as below:
Column | Type |
Modifiers
-------------------------+-----------------------------+-------------------
-----------------------------------------
id | integer | not null default
nextval('access_grants_id_seq'::regclass)
user_id | integer | not null
code | text | not null
client_application_name | text | not null
access_tokens | text[] | default
'{}'::text[]
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
mongo_id | text |
Indexes:
"access_grants_pkey" PRIMARY KEY, btree (id)
"index_access_grants_on_code" UNIQUE, btree (code)
"index_access_grants_on_mongo_id" UNIQUE, btree (mongo_id)
"idx_access_grants_on_access_tokens" gin (access_tokens)
"index_access_grants_on_user_id" btree (user_id)

The array length distribution of access_token is below:
309997 rows has only one element, 248334 rows has empty array, 432 rows
has array length >100, and 1 row has array length 3575.
The table size is 154MB, and the index size is 180MB.

It¹s on AWS db.r3.xlarge instance with 4 virtual cores4, Memory30.5 GiB,
General purpose ssd, with shared_buffers 1048576 and work_mem 159744.

On 8/22/15, 12:36 AM, "pgsql-performance-owner(at)postgresql(dot)org on behalf of
Tomas Vondra" <pgsql-performance-owner(at)postgresql(dot)org on behalf of
tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

>Hi,
>
>On 08/22/2015 03:55 AM, Guo, Yun wrote:
>> Hi,
>>
>> We have a query on a column with GIN index, but query plan chooses not
>> using the index but do an seq scan whichi is must slower
>>
>> CREATE INDEX idx_access_grants_on_access_tokens ON access_grants USING
>> gin (access_tokens);
>>
>> explain analyze SELECT "access_grants".* FROM "access_grants" WHERE
>> (access_tokens @> ARRAY['124e5a1f9de325fc176a7c89152ac734']) ;
>> QUERY PLAN
>>
>>-------------------------------------------------------------------------
>>-------------------------------------------------
>> Limit (cost=0.00..7.46 rows=1 width=157) (actual
>> time=260.376..260.377 rows=1 loops=1)
>> -> Seq Scan on access_grants (cost=0.00..29718.03 rows=3985
>> width=157) (actual time=260.373..260.373 rows=1 loops=1)
>> Filter: (access_tokens @>
>> '{124e5a1f9de325fc176a7c89152ac734}'::text[])
>> Rows Removed by Filter: 796818
>> Total runtime: 260.408 ms
>>
>
>I find it very likely that the explain output actually comes from a
>slightly different query, including a LIMIT 1 clause.
>
>That might easily be the problem here, because the optimizer expects the
>3985 "matches" to be uniformly distributed in the table, so it thinks
>it'll scan just a tiny fraction of the table (1/3985) until the first
>match. But it's quite possible all at rows are end of the table, and the
>executor has to actually scan the whole table.
>
>It's difficult to say without further details of the table and how the
>data are generated.
>
>> We tested on smaller table in development region and it chooses to use
>> the index there. However, in production size table it decides to ignore
>> the index for unknown reasons.
>
>Please provide explain output from that table. It's difficult to say
>what's different without seeing the details.
>
>Also please provide important details about the system (e.g. which
>PostgreSQL version, how much RAM, what work_mem/shared_buffers and such
>stuff).
>
>> Is the large number of tuples skewing the query planner¹s decision
>> or the index itself is larger than the table therefor it would decide
>> to do table scan?
>
>What large number of tuples? The indexes are supposed to be more
>efficient the larger the table is.
>
>regards
>
>--
>Tomas Vondra http://www.2ndQuadrant.com
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Guo, Yun 2015-08-24 15:08:29 Re: query not using GIN index
Previous Message Andrew Dunstan 2015-08-24 11:44:17 Re: problem with select *