Re: Planner doesn't take indexes into account

From: AI Rumman <rummandba(at)gmail(dot)com>
To: Grzegorz Olszewski <grzegorz(dot)olszewski(at)outlook(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Planner doesn't take indexes into account
Date: 2014-05-27 21:14:21
Message-ID: CAGoODpfb_3yWrN5N4P-8fL5Q0Tz0ZOqSL-HiVLrY0FCE=_9b9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

What is random_page_cost and seq_page_cost in your server?
And how many rows does the table have?

On Tue, May 27, 2014 at 2:09 PM, Grzegorz Olszewski <
grzegorz(dot)olszewski(at)outlook(dot)com> wrote:

> Hi,
>
> I wonder why planner uses Seq Scan instead of Index Scan.
>
> Here is my table (partial):
> content.contents
>
> -------------------------+-----------------------------+-----------------------------------------------------------------
> id | bigint | niepusty
> domyślnie nextval('content.contents_id_seq'::regclass)
> version | integer | niepusty
> date_published | timestamp without time zone |
> moderation_status | character varying(50) |
> publication_status | character varying(30) |
>
> And indexes (there are some other indexes too):
> "contents_id_pkey" PRIMARY KEY, btree (id)
> "contents_date_published_idx" btree (date_published)
> "contents_moderation_status_idx" btree (moderation_status)
> "contents_publication_status_idx" btree (publication_status)
>
> I tried also creating following indexes:
> "contents_date_published_publication_status_moderation_statu_idx"
> btree (date_published, publication_status, moderation_status)
> "contents_publication_status_idx1" btree ((publication_status::text))
> "contents_moderation_status_idx1" btree ((moderation_status::text))
>
> Then for this query (genrated by Hibernate):
> explain (analyze, buffers) select count(*) as y0_ from content.contents
> this_ inner join content.content_categories cat1_ on
> this_.CONTENT_CATEGORY_ID=cat1_.ID where cat1_.name in ([...])
> and this_.date_published<='2014-05-26 12:23:31.557000 +02:00:00'
> and (this_.PUBLICATION_STATUS is null or this_.PUBLICATION_STATUS<>'DRAFT')
> and (this_.moderation_status is null or
> this_.moderation_status<>'DANGEROUS')
> and exists(select * from content.content_visibilities cv where
> cv.content_id = this_.ID and cv.user_group_id in (1,2));
>
> Planner creates such plan:
>
> QUERY
> PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Hash Semi Join (cost=31706.84..106020.81 rows=21871 width=2076) (actual
> time=1197.658..6012.406 rows=430218 loops=1)
> Hash Cond: (this_.id = cv.content_id)
> Buffers: shared hit=5 read=59031 written=3, temp read=47611
> written=47549
> -> Hash Join (cost=2.22..56618.11 rows=22881 width=2076) (actual time=
> 0.163..1977.304 rows=430221 loops=1)
> Hash Cond: (this_.content_category_id = cat1_.id)
> Buffers: shared hit=1 read=46829 written=1
> -> Seq Scan on contents this_ (cost=0.00..54713.92 rows=446176
> width=2030) (actual time=0.048..915.724 rows=450517 loops=1)
> Filter: ((date_published <= '2014-05-26
> 12:23:31.557'::timestamp without time zone) AND ((publication_status IS
> NULL) OR ((publication_status)::text <> 'DRAFT'::text)) AND
> ((moderation_status IS NULL) OR ((moderation_status)::text <>
> 'DANGEROUS'::text)))
> Rows Removed by Filter: 50
> Buffers: shared read=46829 written=1
> -> Hash (cost=2.17..2.17 rows=4 width=46) (actual
> time=0.089..0.089 rows=4 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 1kB
> Buffers: shared hit=1
> -> Seq Scan on content_categories cat1_ (cost=0.00..2.17
> rows=4 width=46) (actual time=0.053..0.076 rows=4 loops=1)
> Filter: ((name)::text = ANY
> ('{przeglad-prasy/rp,przeglad-prasy/parkiet,komunikat-z-rynku-pap-emitent,komunikat-z-rynku-pap-depesze}'::text[]))
> Rows Removed by Filter: 74
> Buffers: shared hit=1
> -> Hash (cost=24435.09..24435.09 rows=443083 width=8) (actual
> time=1197.146..1197.146 rows=447624 loops=1)
> Buckets: 4096 Batches: 32 Memory Usage: 560kB
> Buffers: shared hit=4 read=12202 written=2, temp written=1467
> -> Bitmap Heap Scan on content_visibilities cv
> (cost=7614.55..24435.09 rows=443083 width=8) (actual time=61.034..647.729
> rows=447624 loops=1)
> Recheck Cond: (user_group_id = ANY ('{1,2}'::bigint[]))
> Buffers: shared hit=4 read=12202 written=2
> -> Bitmap Index Scan on
> content_visibilities_user_group_id_idx (cost=0.00..7503.78 rows=443083
> width=0) (actual time=58.680..58.680 rows=447626 loops=1)
> Index Cond: (user_group_id = ANY ('{1,2}'::bigint[]))
> Buffers: shared hit=3 read=1226
> Total runtime: 6364.689 ms
> (27 wierszy)
>
> The suspicious part is:
> -> Seq Scan on contents this_ (cost=0.00..54713.92 rows=446176
> width=2030) (actual time=0.048..915.724 rows=450517 loops=1)
> Filter: ((date_published <= '2014-05-26
> 12:23:31.557'::timestamp without time zone) AND ((publication_status IS
> NULL) OR ((publication_status)::text <> 'DRAFT'::text)) AND
> ((moderation_status IS NULL) OR ((moderation_status)::text <>
> 'DANGEROUS'::text)))
>
> I don't understand why planner doesn't use indexes. The problem is there
> are about 0.5M rows satisfying condition (almost every row in the table).
> Could you please explain this behavior?
>
> I'm using PostgreSQL 9.2.8 on Ubuntu 12.04 LTS x86_64
>
> Best regards,
> Grzegorz Olszewski
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Albe Laurenz 2014-05-28 07:50:40 Re: NFS, file system cache and shared_buffers
Previous Message Grzegorz Olszewski 2014-05-27 21:09:45 Planner doesn't take indexes into account