Re: Query RE: Optimising UUID Lookups

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Roland Dunn <roland(dot)dunn(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query RE: Optimising UUID Lookups
Date: 2015-03-24 11:43:10
Message-ID: CAK-MWwQgu5=W1_X1u0bsKd3hhucFAw46_AKTwmq-GpVif+qvxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> (1) SELECT uuid FROM lookup WHERE state = 200 LIMIT 4000;
>
> OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
> ------------------------------------------------
> Limit (cost=0.00..4661.02 rows=4000 width=16) (actual
> time=0.009..1.036 rows=4000 loops=1)
> Buffers: shared hit=42
> -> Seq Scan on lookup (cost=0.00..1482857.00 rows=1272559
> width=16) (actual time=0.008..0.777 rows=4000 loops=1)
> Filter: (state = 200)
> Rows Removed by Filter: 410
> Buffers: shared hit=42
> Total runtime: 1.196 ms
> (7 rows)
>
> Question: Why does this do a sequence scan and not an index scan when
> there is a btree on state?
>

very likely that state=200 is very common value in the table
so seq scan of few pages (42 to be exact) is faster than performing index
scan.​

> (2) SELECT article_data.id, article_data.uuid, article_data.title,
> article_data.text FROM article_data WHERE uuid = ANY
> ('{f0d5e665-4f21-4337-a54b-cf0b4757db65,..... 3999 more uuid's
> ....}'::uuid[]);
>
>
> OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
> ------------------------------------------------
> Index Scan using article_data_uuid_key on article_data
> (cost=5.56..34277.00 rows=4000 width=581) (actual time=0.063..66029.031
> rows=4000 loops=1)
> Index Cond: (uuid = ANY
> ​
> (
> ​'...'
> ::uuid[]))
>
Buffers: shared hit=16060
> ​​
> read=4084 dirtied=292
> Total runtime: 66041.443 ms Question:

​>>​
Why is this so slow, even though it's reading from disk?

As I already suggested enable track_io_timing in the database and use
explain (analyze, costs, buffer, timing)
to see how much exactly time had been spent during IO operations.

The time requred for single random IO operation for common HDD's are around
10ms, so reading ​read=4084 pages could easily took 60seconds especially if
some other IO activity exist on the server.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Feike Steenbergen 2015-03-25 12:04:20 Index only scan sometimes switches to sequential scan for small amount of rows
Previous Message Roland Dunn 2015-03-24 10:45:26 Re: Query RE: Optimising UUID Lookups