Performance of a nested loop, whose inner loop uses an index scan.

From: negora <public(at)negora(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance of a nested loop, whose inner loop uses an index scan.
Date: 2016-10-19 10:54:46
Message-ID: 5cb92dc3-af70-3cb1-5b2b-87ccc31f62cb@negora.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello:

I've a question about the performance of a query plan that uses a nested
loop, and whose inner loop uses an index scan. Would you be so kind to
help me, please?

I'm using PostgreSQL 9.5.4 on Ubuntu 14.04 64-bit (kernel 4.8.2). I've 3
tables, which are "answers", "test_completions" and "courses". The first
one contains around ~30 million rows, whereas the others only have a few
thousands each one. The query that I'm performing is very simple,
although retrieves lots of rows:

---------------------
SELECT answers.*
FROM answers
JOIN test_completions ON test_completions.test_completion_id =
answers.test_completion_id
JOIN courses ON courses.course_id = test_completions.course_id
WHERE courses.group_id = 2;
---------------------

This yields the following plan:

---------------------
Nested Loop (cost=245.92..383723.28 rows=7109606 width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)
-> Hash Join (cost=245.36..539.81 rows=3081 width=8) (actual
time=1.077..6.087 rows=3123 loops=1)
Hash Cond: (test_completions.course_id =
courses.course_id)
-> Seq Scan on test_completions (cost=0.00..214.65
rows=13065 width=16) (actual time=0.005..1.051 rows=13065 loops=1)
-> Hash (cost=204.11..204.11 rows=3300 width=8)
(actual time=1.063..1.063 rows=3300 loops=1)
Buckets: 4096 Batches: 1 Memory Usage:
161kB
-> Bitmap Heap Scan on courses
(cost=45.86..204.11 rows=3300 width=8) (actual time=0.186..0.777
rows=3300 loops=1)
Recheck Cond: (group_id = 2)
Heap Blocks: exact=117
-> Bitmap Index Scan on
fki_courses_group_id_fkey (cost=0.00..45.03 rows=3300 width=0) (actual
time=0.172..0.172 rows=3300 loops=1)
Index Cond:
(group_id = 2)
### HERE ###
-> Index Scan using fki_answers_test_completion_id_fkey on
answers (cost=0.56..96.90 rows=2747 width=38) (actual time=0.007..0.558
rows=2852 loops=3123)
### HERE ###
Index Cond: (test_completion_id =
test_completions.test_completion_id)
Planning time: 0.523 ms
Execution time: 2805.530 ms
---------------------

My doubt is about the inner loop of the nested loop, the one that I've
delimited with ### HERE ### . This loop is the part that, obviously,
more time consumes. Because its run 3,123 times and requires lots of
accesses to multiple database pages. But, Is there anything that I can
do to reduce even more the time spent in this part? Apart of:

* Clustering the "answers" table.
* Upgrading PostgreSQL to version 9.6, to take advantage of the
index scans in parallel.
* Upgrading the hardware.

Thank you!

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matheus de Oliveira 2016-10-19 11:15:59 Re: Performance of a nested loop, whose inner loop uses an index scan.
Previous Message Sam Gendler 2016-10-18 22:53:54 Re: Should I generate strings in Postgres of Python?