Re: Very poor read performance, query independent

From: Charles Nadeau <charles(dot)nadeau(at)gmail(dot)com>
To: Igor Neyman <ineyman(at)perceptron(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very poor read performance, query independent
Date: 2017-07-16 09:20:57
Message-ID: CADFyZw7JZ5RGzB5aZYOtfnUzw7pT2is_+ughn3yyVkTq-XZxmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Igor,

I set the work_mem to 12GB, restarted postrgresql, repeat the same "explain
(analyze, buffers)..." as above and the read throughput was very low, at
most 10MB/s. All the sorting operation are now done in memory.
I lowered the work_mem back to 6GB, restarted postrgresql, repeat the same
"explain (analyze, buffers)..." as above and the read throughput was very
low, at most 10MB/s. The 1st sorting operation is still done in memory, the
second one to disk. I think I need about 4GB to do all sort in memory.
One thing I remember from Friday's "explain (analyze, buffers)...". I set
temp_buffer and work_mem to 6GB as I read both your message one after the
other. So I decided to try again: I then set work_mem=6GB,
temp_buffers=6GB, restarted postrgresql, repeat the same "explain (analyze,
buffers)..." as above and the read throughput was very low again, at most
10MB/s. The 1st sorting operation is still done in memory, the second one
to disk.
For the last test, I brought back work_mem and temp_buffer to their
original value. The read throughput is still low.
In all cases, about 20 minutes after the query starts, it start writing to
disk furiously. Here are the peak values as reported by dstat:
work_mem=12GB, temp_buffers=8MB: peak of 393MB/s
work_mem=6GB, temp_buffers=8MB: peak of 579MB/s
work_mem=6GB, temp_buffers=6GB: peak of 418MB/s
work_mem=1468006kB and temp_buffers=8MB, peak of 61MB/s
Also, at peak write, the server alomost ran of memory: the cache almost
goes to 0 and it starts swapping.

This query is a bit extreme in terms of sorting. Maybe I should try to
benchmark while counting all the records of my biggest table like Mark
Kirkwood suggested. I'll do some more tests and post the results back to
the mailing list.
Thanks!

Charles

On Fri, Jul 14, 2017 at 9:13 PM, Igor Neyman <ineyman(at)perceptron(dot)com> wrote:

> *From:* Charles Nadeau [mailto:charles(dot)nadeau(at)gmail(dot)com]
> *Sent:* Friday, July 14, 2017 11:35 AM
> *To:* Igor Neyman <ineyman(at)perceptron(dot)com>
> *Cc:* Jeff Janes <jeff(dot)janes(at)gmail(dot)com>; pgsql-performance(at)postgresql(dot)org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> Igor,
>
>
>
> Initially temp_buffer was left to its default value (8MB). Watching the
> content of the directory that stores the temporary files, I found that I
> need at most 21GB of temporary files space. Should I set temp_buffer to
> 21GB?
>
> Here is the explain you requested with work_mem set to 6GB:
>
> flows=# set work_mem='6GB';
>
> SET
>
> flows=# explain (analyze, buffers) SELECT DISTINCT
>
> srcaddr,
>
> dstaddr,
>
> dstport,
>
> COUNT(*) AS conversation,
>
> SUM(doctets) / 1024 / 1024 AS mbytes
>
> FROM
>
> flowscompact,
>
> mynetworks
>
> WHERE
>
> mynetworks.ipaddr >>= flowscompact.srcaddr
>
> AND dstaddr IN
>
> (
>
> SELECT
>
> dstaddr
>
> FROM
>
> dstexterne
>
> )
>
> GROUP BY
>
> srcaddr,
>
> dstaddr,
>
> dstport
>
> ORDER BY
>
> mbytes DESC LIMIT 50;
>
>
> QUERY PLAN
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------------------
>
> Limit (cost=48135680.07..48135680.22 rows=50 width=52) (actual
> time=2227678.196..2227678.223 rows=50 loops=1)
>
> Buffers: shared hit=728798038 read=82974833, temp read=381154
> written=381154
>
> -> Unique (cost=48135680.07..48143613.62 rows=2644514 width=52)
> (actual time=2227678.194..2227678.217 rows=50 loops=1)
>
> Buffers: shared hit=728798038 read=82974833, temp read=381154
> written=381154
>
> -> Sort (cost=48135680.07..48137002.33 rows=2644514 width=52)
> (actual time=2227678.192..2227678.202 rows=50 loops=1)
>
> Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
> '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
> (count(*))
>
> Sort Method: quicksort Memory: 654395kB
>
> Buffers: shared hit=728798038 read=82974833, temp
> read=381154 written=381154
>
> -> GroupAggregate (cost=48059426.65..48079260.50
> rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671
> loops=1)
>
> Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
>
> Buffers: shared hit=728798038 read=82974833, temp
> read=381154 written=381154
>
> -> Sort (cost=48059426.65..48060748.90
> rows=2644514 width=20) (actual time=2167896.815..2189107.205 rows=91745640
> loops=1)
>
> Sort Key: flows.srcaddr, flows.dstaddr,
> flows.dstport
>
> Sort Method: external merge Disk: 3049216kB
>
> Buffers: shared hit=728798038 read=82974833,
> temp read=381154 written=381154
>
> -> Gather (cost=30060688.07..48003007.07
> rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640
> loops=1)
>
> Workers Planned: 12
>
> Workers Launched: 12
>
> Buffers: shared hit=728798037
> read=82974833
>
> -> Hash Semi Join
> (cost=30059688.07..47951761.31 rows=220376 width=20) (actual
> time=1268845.181..2007864.725 rows=7057357 loops=13)
>
> Hash Cond: (flows.dstaddr =
> flows_1.dstaddr)
>
> Buffers: shared hit=728795193
> read=82974833
>
> -> Nested Loop
> (cost=0.03..17891246.86 rows=220376 width=20) (actual
> time=0.207..723790.283 rows=37910370 loops=13)
>
> Buffers: shared hit=590692229
> read=14991777
>
> -> Parallel Seq Scan on
> flows (cost=0.00..16018049.14 rows=55094048 width=20) (actual
> time=0.152..566179.117 rows=45371630 loops=13)
>
> Buffers: shared
> hit=860990 read=14991777
>
> -> Index Only Scan using
> mynetworks_ipaddr_idx on mynetworks (cost=0.03..0.03 rows=1 width=8)
> (actual time=0.002..0.002 rows=1 loops=589831190)
>
> Index Cond: (ipaddr >>=
> (flows.srcaddr)::ip4r)
>
> Heap Fetches: 0
>
> Buffers: shared
> hit=589831203
>
> -> Hash
> (cost=30059641.47..30059641.47 rows=13305 width=4) (actual
> time=1268811.101..1268811.101 rows=3803508 loops=13)
>
> Buckets: 4194304 (originally
> 16384) Batches: 1 (originally 1) Memory Usage: 166486kB
>
> Buffers: shared hit=138102964
> read=67983056
>
> -> HashAggregate
> (cost=30059561.64..30059601.56 rows=13305 width=4) (actual
> time=1265248.165..1267432.083 rows=3803508 loops=13)
>
> Group Key:
> flows_1.dstaddr
>
> Buffers: shared
> hit=138102964 read=67983056
>
> -> Nested Loop Anti
> Join (cost=0.00..29729327.92 rows=660467447 width=4) (actual
> time=0.389..1201072.707 rows=125838232 loops=13)
>
> Join Filter:
> (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
>
> Rows Removed by
> Join Filter: 503353617
>
> Buffers: shared
> hit=138102964 read=67983056
>
> -> Seq Scan on
> flows flows_1 (cost=0.00..17836152.73 rows=661128576 width=4) (actual
> time=0.322..343152.274 rows=589831190 loops=13)
>
> Buffers:
> shared hit=138102915 read=67983056
>
> -> Materialize
> (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2
> loops=7667805470)
>
> Buffers:
> shared hit=13
>
> -> Seq
> Scan on mynetworks mynetworks_1 (cost=0.00..1.01 rows=4 width=8) (actual
> time=0.006..0.007 rows=4 loops=13)
>
>
> Buffers: shared hit=13
>
> Planning time: 0.941 ms
>
> Execution time: 2228345.171 ms
>
> (48 rows)
>
>
>
> With a work_mem at 6GB, I noticed that for the first 20 minutes the query
> was running, the i/o wait was much lower, hovering aroun 3% then it jumped
> 45% until almost the end of the query.
>
>
>
> flowscompact and dstexterne are actually views. I use views to simplify
> query writing and to "abstract" queries that are use often in other
> queries. flowscompact is a view built on table flows (having about 590
> million rows), it only keeps the most often used fields.
>
> flows=# \d+ flowscompact;
>
> View "public.flowscompact"
>
> Column | Type | Modifiers | Storage | Description
>
> -----------+--------------------------+-----------+---------+-------------
>
> flow_id | bigint | | plain |
>
> sysuptime | bigint | | plain |
>
> exaddr | ip4 | | plain |
>
> dpkts | integer | | plain |
>
> doctets | bigint | | plain |
>
> first | bigint | | plain |
>
> last | bigint | | plain |
>
> srcaddr | ip4 | | plain |
>
> dstaddr | ip4 | | plain |
>
> srcport | integer | | plain |
>
> dstport | integer | | plain |
>
> prot | smallint | | plain |
>
> tos | smallint | | plain |
>
> tcp_flags | smallint | | plain |
>
> timestamp | timestamp with time zone | | plain |
>
> View definition:
>
> SELECT flowstimestamp.flow_id,
>
> flowstimestamp.sysuptime,
>
> flowstimestamp.exaddr,
>
> flowstimestamp.dpkts,
>
> flowstimestamp.doctets,
>
> flowstimestamp.first,
>
> flowstimestamp.last,
>
> flowstimestamp.srcaddr,
>
> flowstimestamp.dstaddr,
>
> flowstimestamp.srcport,
>
> flowstimestamp.dstport,
>
> flowstimestamp.prot,
>
> flowstimestamp.tos,
>
> flowstimestamp.tcp_flags,
>
> flowstimestamp."timestamp"
>
> FROM flowstimestamp;
>
> mynetworks is a table having one column and 4 rows; it contains a list of
> our network networks:
>
> flows=# select * from mynetworks;
>
> ipaddr
>
> ----------------
>
> 192.168.0.0/24
>
> 10.112.12.0/30
>
> 10.112.12.4/30
>
> 10.112.12.8/30
>
> (4 row)
>
> flows=# \d+ mynetworks;
>
> Table "public.mynetworks"
>
> Column | Type | Modifiers | Storage | Stats target | Description
>
> --------+------+-----------+---------+--------------+-------------
>
> ipaddr | ip4r | | plain | |
>
> Indexes:
>
> "mynetworks_ipaddr_idx" gist (ipaddr)
>
> dstexterne is a view listing all the destination IPv4 addresses not inside
> our network; it has one column and 3.8 million rows.
>
> flows=# \d+ dstexterne;
>
> View "public.dstexterne"
>
> Column | Type | Modifiers | Storage | Description
>
> ---------+------+-----------+---------+-------------
>
> dstaddr | ip4 | | plain |
>
> View definition:
>
> SELECT DISTINCT flowscompact.dstaddr
>
> FROM flowscompact
>
> LEFT JOIN mynetworks ON mynetworks.ipaddr >>
> flowscompact.dstaddr::ip4r
>
> WHERE mynetworks.ipaddr IS NULL;
>
> Thanks!
>
>
>
> Charles
>
>
>
> Charles,
>
>
>
> Don’t change temp_buffers.
>
> Try to increase work_mem even more, say work_mem=’12GB’, because it’s
> still using disk for sorting (starting around 20th minute as you noticed).
>
> See if this:
>
> “Sort Method: external merge Disk: 3049216kB”
>
> goes away.
>
> Igor
>
>
>

--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Charles Nadeau 2017-07-16 09:22:00 Re: Very poor read performance, query independent
Previous Message Mark Kirkwood 2017-07-15 23:48:08 Re: Very poor read performance, query independent