Re: Slow query due to slow I/O

From: Bryce Covert <bryce(at)brycecovertoperations(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow query due to slow I/O
Date: 2013-12-13 00:03:09
Message-ID: 52AA4EBD.1040800@brycecovertoperations.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

It looks like you guys were right. I think vacuum analyzing this made it
do an IOS. It seems like materialized views are going to be the best
bet. I see how that would allow sequential reading. Thanks!

Bryce

> Jeff Janes <mailto:jeff(dot)janes(at)gmail(dot)com>
> December 12, 2013 4:01 PM
> On Thu, Dec 12, 2013 at 3:04 PM, Bryce Covert
> <bryce(at)brycecovertoperations(dot)com
> <mailto:bryce(at)brycecovertoperations(dot)com>> wrote:
>
> Not sure if this is helpful, but I tried upgrading to 9.2, and
> here's what I got:
>
> ---------
> Limit (cost=0.00..535.78 rows=50 width=8) (actual
> time=1037.376..135043.945 rows=50 loops=1)
> Output: premiseaccount.id <http://premiseaccount.id>,
> (sum(electricusage.usage))
> Buffers: shared hit=4851 read=18718
> -> GroupAggregate (cost=0.00..198012.28 rows=18479 width=8)
> (actual time=1037.369..135043.700 rows=50 loops=1)
> Output: premiseaccount.id <http://premiseaccount.id>,
> sum(electricusage.usage)
>
> Filter: ((sum(electricusage.usage) >= 3284::numeric) AND
> (sum(electricusage.usage) <= 3769::numeric))
> Rows Removed by Filter: 1476
> Buffers: shared hit=4851 read=18718
> -> Nested Loop (cost=0.00..196247.46 rows=148764
> width=8) (actual time=107.092..134845.231 rows=15188 loops=1)
> Output: premiseaccount.id
> <http://premiseaccount.id>, electricusage.usage
> Buffers: shared hit=4851 read=18718
> -> Index Only Scan using
> premiseaccount_bucket_58c70392619aa36f on public.premiseaccount
> premiseaccount (cost=0.00..43135.13 rows=18479 width=4) (actual
> time=45.368..137.340 rows=1527 loops=1)
> Output: premiseaccount.bucket,
> premiseaccount.id <http://premiseaccount.id>
> Index Cond: (premiseaccount.bucket =
> '85375_single-family'::text)
> Heap Fetches: 1527
>
> You had to hit the heap for every row, meaning the index-only feature
> was useless. Are you vacuuming enough? How fast does this table
> change? What is relallvisible from pg_class for these tables?
>
> -> Index Scan using
> electricusage_premise_account_id on public.electricusage
> electricusage (cost=0.00..8.20 rows=9 width=8) (actual
> time=22.306..88.136 rows=10 loops=1527)
> Output: electricusage.id
> <http://electricusage.id>, electricusage.created,
> electricusage.modified, electricusage.from_date,
> electricusage.to_date, electricusage.usage, electricusage.demand,
> electricusage.bill_amount, electricusage.premise_account_id
> Index Cond: (electricusage.premise_account_id
> = premiseaccount.id <http://premiseaccount.id>)
> Filter: (electricusage.from_date >=
> '2012-11-20 00:00:00+00'::timestamp with time zone)
> Rows Removed by Filter: 2
> Buffers: shared hit=4850 read=18033
> Total runtime: 135044.256 ms
> (23 rows)
>
>
> Looks like it is doing an index only scan for the first table, but
> not for the second. I tried creating two indexes that
> theoretically should make it not have to go to the physical table.:
> "electricusage_premise_account_id_36bc8999ced10059" btree
> (premise_account_id, from_date, usage)
> "ix_covered_2" btree (premise_account_id, from_date DESC,
> usage, id)
>
> Any idea why it's not using that?
>
>
> If the other IOS in this plan is anything to go by, then your table
> doesn't have enough all-visible pages to make it worthwhile. So it
> chooses the smaller index, instead of the bigger one that could
> theoretically support an IOS.
>
> Cheers,
>
> Jeff
> Bryce Covert <mailto:bryce(at)brycecovertoperations(dot)com>
> December 12, 2013 3:04 PM
> Not sure if this is helpful, but I tried upgrading to 9.2, and here's
> what I got:
>
> ---------
> Limit (cost=0.00..535.78 rows=50 width=8) (actual
> time=1037.376..135043.945 rows=50 loops=1)
> Output: premiseaccount.id, (sum(electricusage.usage))
> Buffers: shared hit=4851 read=18718
> -> GroupAggregate (cost=0.00..198012.28 rows=18479 width=8)
> (actual time=1037.369..135043.700 rows=50 loops=1)
> Output: premiseaccount.id, sum(electricusage.usage)
> Filter: ((sum(electricusage.usage) >= 3284::numeric) AND
> (sum(electricusage.usage) <= 3769::numeric))
> Rows Removed by Filter: 1476
> Buffers: shared hit=4851 read=18718
> -> Nested Loop (cost=0.00..196247.46 rows=148764 width=8)
> (actual time=107.092..134845.231 rows=15188 loops=1)
> Output: premiseaccount.id, electricusage.usage
> Buffers: shared hit=4851 read=18718
> -> Index Only Scan using
> premiseaccount_bucket_58c70392619aa36f on public.premiseaccount
> premiseaccount (cost=0.00..43135.13 rows=18479 width=4) (actual
> time=45.368..137.340 rows=1527 loops=1)
> Output: premiseaccount.bucket, premiseaccount.id
> Index Cond: (premiseaccount.bucket =
> '85375_single-family'::text)
> Heap Fetches: 1527
> Buffers: shared hit=1 read=685
> -> Index Scan using electricusage_premise_account_id
> on public.electricusage electricusage (cost=0.00..8.20 rows=9
> width=8) (actual time=22.306..88.136 rows=10 loops=1527)
> Output: electricusage.id, electricusage.created,
> electricusage.modified, electricusage.from_date,
> electricusage.to_date, electricusage.usage, electricusage.demand,
> electricusage.bill_amount, electricusage.premise_account_id
> Index Cond: (electricusage.premise_account_id =
> premiseaccount.id)
> Filter: (electricusage.from_date >= '2012-11-20
> 00:00:00+00'::timestamp with time zone)
> Rows Removed by Filter: 2
> Buffers: shared hit=4850 read=18033
> Total runtime: 135044.256 ms
> (23 rows)
>
>
> Looks like it is doing an index only scan for the first table, but not
> for the second. I tried creating two indexes that theoretically
> should make it not have to go to the physical table.:
> "electricusage_premise_account_id_36bc8999ced10059" btree
> (premise_account_id, from_date, usage)
> "ix_covered_2" btree (premise_account_id, from_date DESC, usage, id)
>
> Any idea why it's not using that?
>
> Thanks!
> Bryce
>
> Claudio Freire <mailto:klaussfreire(at)gmail(dot)com>
> December 12, 2013 1:35 PM
>
> On Thu, Dec 12, 2013 at 6:16 PM, Bryce Covert
> <bryce(at)brycecovertoperations(dot)com
> <mailto:bryce(at)brycecovertoperations(dot)com>> wrote:
>
>
> Thanks a lot for the help. I'm not familiar with explain buffers,
> but here's the results:
>
> Limit (cost=0.00..648.71 rows=50 width=8) (actual
> time=653.681..52328.707 rows=50 loops=1)
> Buffers: shared hit=7875 read=9870
> -> GroupAggregate (cost=0.00..55672.36 rows=4291 width=8)
> (actual time=653.671..52328.480 rows=50 loops=1)
>
> Filter: ((sum(electricusage.usage) >= 3284::numeric) AND
> (sum(electricusage.usage) <= 3769::numeric))
> Buffers: shared hit=7875 read=9870
> -> Nested Loop (cost=0.00..55262.93 rows=34506 width=8)
> (actual time=432.129..52200.465 rows=30711 loops=1)
> Buffers: shared hit=7875 read=9870
> -> Index Scan using
> premiseaccount_bucket_58c70392619aa36f on premiseaccount
> premiseaccount (cost=0.00..15433.71 rows=4291 width=4) (actual
> time=338.160..10014.780 rows=3437 loops=1)
> Index Cond: ((bucket)::text =
> '85349_single-family'::text)
> Buffers: shared hit=744 read=2692
> -> Index Scan using
> electricusage_premise_account_id on electricusage electricusage
> (cost=0.00..9.17 rows=9 width=8) (actual time=11.430..12.235
> rows=9 loops=3437)
> Index Cond: (premise_account_id =
> premiseaccount.id <http://premiseaccount.id>)
> Filter: (from_date >= '2012-11-20
> 00:00:00+00'::timestamp with time zone)
> Buffers: shared hit=7131 read=7178
> Total runtime: 52329.028 ms
> (15 rows)
>
> and the iostat results...
> Linux 3.11.6-x86_64-linode35 (preview-aps-new) 12/12/2013
> _x86_64_ (8 CPU)
>
> ...
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> xvda 0.10 47.10 152.40 7.80 2.20
> 0.21 30.92 1.36 8.50 8.03 17.53 6.14 98.33
> xvdb 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
>
>
> This means it's doing random I/O, and that your disk is a single
> 7200RPM drive (152 r/s being typical for that hardware).
>
> You can improve this by either:
>
> 1 - Turning that random I/O pattern into sequential, or
> 2 - Getting better I/O.
>
> I'll assume 2 isn't available to you on linode, so for 1, you could
> try lowering effective_cache_size substantially. It seems you're not
> getting nearly as much caching as you think (ie 2GB). However, I doubt
> there's a plan that can get you significantly better performance given
> your hardware.
>
> You may shave a few seconds, though, if you increase work_mem. It
> seems it should have used a bitmap index scan for at least one of the
> index scans there, and a low work_mem could be what's limiting the
> planner's possibilities. What are your settings in that area?
>
>
> Bryce Covert <mailto:bryce(at)brycecovertoperations(dot)com>
> December 12, 2013 1:16 PM
> Hey Claudio,
>
> Thanks a lot for the help. I'm not familiar with explain buffers, but
> here's the results:
>
> Limit (cost=0.00..648.71 rows=50 width=8) (actual
> time=653.681..52328.707 rows=50 loops=1)
> Buffers: shared hit=7875 read=9870
> -> GroupAggregate (cost=0.00..55672.36 rows=4291 width=8) (actual
> time=653.671..52328.480 rows=50 loops=1)
> Filter: ((sum(electricusage.usage) >= 3284::numeric) AND
> (sum(electricusage.usage) <= 3769::numeric))
> Buffers: shared hit=7875 read=9870
> -> Nested Loop (cost=0.00..55262.93 rows=34506 width=8)
> (actual time=432.129..52200.465 rows=30711 loops=1)
> Buffers: shared hit=7875 read=9870
> -> Index Scan using
> premiseaccount_bucket_58c70392619aa36f on premiseaccount
> premiseaccount (cost=0.00..15433.71 rows=4291 width=4) (actual
> time=338.160..10014.780 rows=3437 loops=1)
> Index Cond: ((bucket)::text =
> '85349_single-family'::text)
> Buffers: shared hit=744 read=2692
> -> Index Scan using electricusage_premise_account_id
> on electricusage electricusage (cost=0.00..9.17 rows=9 width=8)
> (actual time=11.430..12.235 rows=9 loops=3437)
> Index Cond: (premise_account_id = premiseaccount.id)
> Filter: (from_date >= '2012-11-20
> 00:00:00+00'::timestamp with time zone)
> Buffers: shared hit=7131 read=7178
> Total runtime: 52329.028 ms
> (15 rows)
>
> and the iostat results...
> Linux 3.11.6-x86_64-linode35 (preview-aps-new) 12/12/2013
> _x86_64_ (8 CPU)
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> xvda 6.94 65.68 12.16 7.64 0.36 0.29
> 67.06 0.44 22.11 3.56 51.66 1.15 2.28
> xvdb 0.00 0.00 0.00 0.00 0.00 0.00
> 10.61 0.00 1.82 1.82 0.00 1.82 0.00
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> xvda 0.10 47.10 152.40 7.80 2.20 0.21
> 30.92 1.36 8.50 8.03 17.53 6.14 98.33
> xvdb 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> xvda 0.20 2.30 212.10 0.70 3.22 0.01
> 31.09 1.47 6.88 6.90 2.86 4.63 98.62
> xvdb 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> xvda 0.20 8.30 183.20 5.10 2.46 0.05
> 27.31 1.68 8.85 6.68 86.78 5.24 98.70
> xvdb 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> xvda 0.10 0.00 165.70 0.00 2.36 0.00
> 29.20 1.46 8.86 8.86 0.00 5.95 98.63
> xvdb 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> I'm not sure how to process this except that 2.5MB/s seems really
> slow, and it looks like it is using postgres' cache quite a bit.
>
> Thanks,
> Bryce
>
>
> Claudio Freire <mailto:klaussfreire(at)gmail(dot)com>
> December 12, 2013 12:48 PM
>
>
>
> Well, did you run benchmarks? How many IOPS do you get from the volumes?
>
> Try running "iostat -x -m -d 10" while the slow query is running and
> pasting the results (or a relevant sample of them).
>
> Also, do run "explain (analyze, buffers)" instead of plain "explain
> analyze".
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Yuri Levinsky 2013-12-15 16:18:18 Current query of the PL/pgsql procedure.
Previous Message Jeff Janes 2013-12-13 00:01:54 Re: Slow query due to slow I/O