Why could different data in a table be processed with different performance?

From: Vladimir Ryabtsev <greatvovan(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Why could different data in a table be processed with different performance?
Date: 2018-09-21 00:07:21
Message-ID: CAMqTPq=aFwfiqv=P9EDs+UxeOgfe03kyQ0uvtUHvSxvfSvYR+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I am experiencing a strange performance problem when accessing JSONB
content by primary key.

My DB version() is PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4,
64-bit
postgres.conf: https://justpaste.it/6pzz1
uname -a: Linux postgresnlpslave 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18
14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
The machine is virtual, running under Hyper-V.
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 1x1 cores
Disk storage: the host has two vmdx drives, first shared between the root
partition and an LVM PV, second is a single LVM PV. Both PVs are in a VG
containing swap and postgres data partitions. The data is mostly on the
first PV.

I have such a table:

CREATE TABLE articles
(
article_id bigint NOT NULL,
content jsonb NOT NULL,
published_at timestamp without time zone NOT NULL,
appended_at timestamp without time zone NOT NULL,
source_id integer NOT NULL,
language character varying(2) NOT NULL,
title text NOT NULL,
topicstopic[] NOT NULL,
objects object[] NOT NULL,
cluster_id bigint NOT NULL,
CONSTRAINT articles_pkey PRIMARY KEY (article_id)
)

We have a Python lib (using psycopg2 driver) to access this table. It
executes simple queries to the table, one of them is used for bulk
downloading of content and looks like this:

select content from articles where id between $1 and $2

I noticed that with some IDs it works pretty fast while with other it is
4-5 times slower. It is suitable to note, there are two main 'categories'
of IDs in this table: first is range 270000000-500000000, and second is
range 10000000000-100030000000. For the first range it is 'fast' and for
the second it is 'slow'. Besides larger absolute numbers withdrawing them
from int to bigint, values in the second range are more 'sparse', which
means in the first range values are almost consequent (with very few
'holes' of missing values) while in the second range there are much more
'holes' (average filling is 35%). Total number of rows in the first range:
~62M, in the second range: ~10M.

I conducted several experiments to eliminate possible influence of
library's code and network throughput, I omit some of them. I ended up with
iterating over table with EXPLAIN to simulate read load:

explain (analyze, buffers)
select count(*), sum(length(content::text)) from articles where article_id
between %s and %s

Sample output:

Aggregate (cost=8635.91..8635.92 rows=1 width=16) (actual
time=6625.993..6625.995 rows=1 loops=1)
Buffers: shared hit=26847 read=3914
-> Index Scan using articles_pkey on articles (cost=0.57..8573.35
rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1)
Index Cond: ((article_id >= 438000000) AND (article_id <=
438005000))
Buffers: shared hit=4342 read=671
Planning time: 0.393 ms
Execution time: 6626.136 ms

Aggregate (cost=5533.02..5533.03 rows=1 width=16) (actual
time=33219.100..33219.102 rows=1 loops=1)
Buffers: shared hit=6568 read=7104
-> Index Scan using articles_pkey on articles (cost=0.57..5492.96
rows=3205 width=107) (actual time=22.167..12082.624 rows=2416 loops=1)
Index Cond: ((article_id >= '100021000000'::bigint) AND (article_id
<= '100021010000'::bigint))
Buffers: shared hit=50 read=2378
Planning time: 0.517 ms
Execution time: 33219.218 ms

During iteration, I parse the result of EXPLAIN and collect series of
following metrics:

- buffer hits/reads for the table,
- buffer hits/reads for the index,
- number of rows (from "Index Scan..."),
- duration of execution.

Based on metrics above I calculate inherited metrics:

- disk read rate: (index reads + table reads) * 8192 / duration,
- reads ratio: (index reads + table reads) / (index reads + table reads +
index hits + table hits),
- data rate: (index reads + table reads + index hits + table hits) * 8192 /
duration,
- rows rate: number of rows / duration.

Since "density" of IDs is different in "small" and "big" ranges, I adjusted
size of chunks in order to get around 5000 rows on each iteration in both
cases, though my experiments show that chunk size does not really matter a
lot.

The issue posted at the very beginning of my message was confirmed for the
*whole* first and second ranges (so it was not just caused by randomly
cached data).

To eliminate cache influence, I restarted Postgres server with flushing
buffers:

/$ postgresql stop; sync; echo 3 > /proc/sys/vm/drop_caches; postgresql
start

After this I repeated the test and got next-to-same picture.

"Small' range: disk read rate is around 10-11 MB/s uniformly across the
test. Output rate was 1300-1700 rows/s. Read ratio is around 13% (why?
Shouldn't it be ~ 100% after drop_caches?).
"Big" range: In most of time disk read speed was about 2 MB/s but sometimes
it jumped to 26-30 MB/s. Output rate was 70-80 rows/s (but varied a lot and
reached 8000 rows/s). Read ratio also varied a lot.

I rendered series from the last test into charts:
"Small" range: https://i.stack.imgur.com/3Zfml.png
"Big" range (insane): https://i.stack.imgur.com/VXdID.png

During the tests I verified disk read speed with iotop and found its
indications very close to ones calculated by me based on EXPLAIN BUFFERS. I
cannot say I was monitoring it all the time, but I confirmed it when it was
2 MB/s and 22 MB/s on the second range and 10 MB/s on the first range. I
also checked with htop that CPU was not a bottleneck and was around 3%
during the tests.

The issue is reproducible on both master and slave servers. My tests were
conducted on slave, while there were no any other load on DBMS, or disk
activity on the host unrelated to DBMS.

My only assumption is that different fragments of data are being read with
different speed due to virtualization or something, but... why is it so
strictly bound to these ranges? Why is it the same on two different
machines?

The file system performance measured by dd:

root(at)postgresnlpslave:/# echo 3 > /proc/sys/vm/drop_caches
root(at)postgresnlpslave:/# dd if=/dev/mapper/postgresnlpslave--vg-root
of=/dev/null bs=8K count=128K
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.12304 s, 506 MB/s

Am I missing something? What else can I do to narrow down the cause?

P.S. Initially posted on
https://stackoverflow.com/questions/52105172/why-could-different-data-in-a-table-be-processed-with-different-performance

Regards,
Vlad

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-09-21 00:42:32 Re: Why could different data in a table be processed with different performance?
Previous Message Pavel Stehule 2018-09-20 09:08:57 Re: [HACKERS] proposal: schema variables