Re: Avoid using index scan backward when limit order desc

From: Melvin Davidson <melvin6925(at)gmail(dot)com>
To: Christophe Escobar <christophe(dot)esco(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Avoid using index scan backward when limit order desc
Date: 2016-12-19 18:01:48
Message-ID: CANu8Fiz8=tN2dV8Tp9qskLdX-+gicgTXtjBr6Xh46aJrDtf_nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Dec 19, 2016 at 12:05 PM, Christophe Escobar <
christophe(dot)esco(at)gmail(dot)com> wrote:

> Hi,
>
> I am using PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc
> (Debian 4.9.2-10) 4.9.2, 64-bit.
>
> I have a notification table with about ~45 000 000 rows.
>
> I have some performance issues when trying to fetch rows from the table
> with a specific query, I suspect the planner to choose the wrong index
> because of the limit.
> The query look like this: SELECT * FROM notifications WHERE bucket_id IN
> (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC
> LIMIT 20;
> It returns 0 rows for this example.
>
> The table looks like this with the indices:
>
> Table "notifications"
> Column | Type |
> Modifiers
> ----------------------------+-----------------------------+-
> -----------------------------------------------------------
> id | integer | not null
> default nextval('notifications_id_seq'::regclass)
> account_id | integer |
> version_id | integer |
> item_id | integer |
> type | character varying(255) |
> created_at | timestamp without time zone |
> updated_at | timestamp without time zone |
> meta_data | text |
> bucket_id | integer |
> Indexes:
> "notifications_pkey" PRIMARY KEY, btree (id)
> "index_notifications_on_account_id" btree (account_id)
> "index_notifications_on_bucket_id" btree (bucket_id)
> "index_notifications_on_item_id" btree (item_id)
> "index_notifications_on_created_at_and_bucket_id" btree (created_at,
> bucket_id)
> "index_notifications_on_type_and_bucket_id" btree (type, bucket_id)
> "index_notifications_on_version_id" btree (version_id)
>
> Before testing, I have run a VACUUM ANALYZE, and here are the statistics
> on the indices:
>
> SELECT relname, relkind, reltuples, relpages
> FROM pg_class
> WHERE relname LIKE '%notifications%';
> relname | relkind | reltuples
> | relpages
> ---------------------------------------------------+--------
> -+-------------+----------
> notifications_pkey | i | 4.55221e+07
> | 124820
> index_notifications_on_account_id | i |
> 4.55221e+07 | 124820
> index_notifications_on_bucket_id | i |
> 4.55221e+07 | 124819
> index_notifications_on_item_id | i |
> 4.55221e+07 | 124821
> index_notifications_on_version_id | i |
> 4.55221e+07 | 124821
> index_notifications_on_created_at_and_bucket_id | i |
> 4.55221e+07 | 175281
> index_notifications_on_type_and_bucket_id | i |
> 4.55221e+07 | 188281
> notifications_id_seq | S | 1
> | 1
> notifications | r | 4.55225e+07
> | 566412
>
>
> I tried three different EXPLAIN ANALYZE, on a subset of my table (with the
> entire table, I have yet to see what is the total duration of the query
> when using LIMIT 20, but it takes more than 5 minutes which is not
> acceptable for my use case).
>
> ** Without limit **
>
> EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN
> (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC;
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------
> Sort (cost=7474.92..7480.08 rows=2061 width=187) (actual
> time=0.149..0.149 rows=0 loops=1)
> Sort Key: created_at
> Sort Method: quicksort Memory: 25kB
> -> Bitmap Heap Scan on notifications (cost=71.68..7361.47 rows=2061
> width=187) (actual time=0.136..0.136 rows=0 loops=1)
> Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND
> (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
> -> Bitmap Index Scan on index_notifications_on_type_and_bucket_id
> (cost=0.00..71.16 rows=2061 width=0) (actual time=0.135..0.135 rows=0
> loops=1)
> Index Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND
> (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
> Planning time: 0.715 ms
> Execution time: 0.198 ms
>
> ** With LIMIT 20 **
>
> EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN
> (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC
> limit 20;
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------------------
> Limit (cost=0.43..3341.84 rows=20 width=187) (actual
> time=60133.701..60133.701 rows=0 loops=1)
> -> Index Scan Backward using index_notifications_on_created_at_and_bucket_id
> on notifications (cost=0.43..344332.66 rows=2061 width=187) (actual
> time=60133.695..60133.695 rows=0 loops=1)
> Filter: (((type)::text = ANY ('{foo,bar}'::text[])) AND
> (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
> Rows Removed by Filter: 3441510
> Planning time: 1.034 ms
> Execution time: 60133.740 ms
>
> ** With limit 50 **
>
> EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN
> (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC
> limit 50;
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------
> Limit (cost=7429.94..7430.06 rows=50 width=187) (actual
> time=0.111..0.111 rows=0 loops=1)
> -> Sort (cost=7429.94..7435.09 rows=2061 width=187) (actual
> time=0.110..0.110 rows=0 loops=1)
> Sort Key: created_at
> Sort Method: quicksort Memory: 25kB
> -> Bitmap Heap Scan on notifications (cost=71.68..7361.47
> rows=2061 width=187) (actual time=0.107..0.107 rows=0 loops=1)
> Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[]))
> AND (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
> -> Bitmap Index Scan on index_notifications_on_type_and_bucket_id
> (cost=0.00..71.16 rows=2061 width=0) (actual time=0.105..0.105 rows=0
> loops=1)
> Index Cond: (((type)::text = ANY
> ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::intege
> r[])))
> Planning time: 0.151 ms
> Execution time: 0.139 ms
>
>
> As you can see, when I have the LIMIT 20, the execution time takes around
> 1 minutes (on a very small subset of the entire table).
> Actually I have tried different LIMIT, and when the LIMIT is <= 45, it
> will use the index scan backward.
>
> Removing the index 'index_notifications_on_created_at_and_bucket_id' may
> prevent the planner from choosing the index scan backward for this query,
> but this index is used for other querying on that table...
>
> 1) Why is the planner changing index scanning at the threshold of 45 for
> the LIMIT ? Why not 50 ? 100 ? I may take the solution in my application to
> have a LIMIT > 45 in order to prevent the performance issue, but am I sure
> that this threshold will always be the same ?
>
> 2) Is it possible for a specific query to force the planner on choosing a
> given index or preventing it from choosing one ?
>
> What kind of other options do I have to solve this performance issue ?
>
> Thanks in advance for any help,
>
> Regards,
>
> --
> Christophe Escobar
>

*You can temporarily disable index scanning for a session with*

*SET enable_indexscan = off;*

*and/orSET enable_indexonlyscan = off;*
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-12-19 18:04:58 Re: Windows installation - could not connect to server: Connection refused (0x0000274D/10061)
Previous Message Adrian Klaver 2016-12-19 17:55:15 Re: Is there a way to Send attachments with email using pgmail postgreSQl?