From: | Bryce Covert <bryce(at)brycecovertoperations(dot)com> |
---|---|
To: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
Cc: | postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Slow query due to slow I/O |
Date: | 2013-12-12 21:24:46 |
Message-ID: | 52AA299E.4030406@brycecovertoperations.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Also, I was reading this:
http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.2, and I
realized that index-only scans weren't introduced until 9.2. I tried
creating a covered index for this, but I don't think it helps in this
situation.
Bryce
> Claudio Freire <mailto:klaussfreire(at)gmail(dot)com>
> December 12, 2013 12:48 PM
>
>
>
> Well, did you run benchmarks? How many IOPS do you get from the volumes?
>
> Try running "iostat -x -m -d 10" while the slow query is running and
> pasting the results (or a relevant sample of them).
>
> Also, do run "explain (analyze, buffers)" instead of plain "explain
> analyze".
>
> Bryce Covert <mailto:bryce(at)brycecovertoperations(dot)com>
> December 12, 2013 12:20 PM
>
> I don't have much info on disks, since this is a virtual server on
> linode. It is running ubuntu 12.04, 8cpus, 4GB ram, 95GB ext3 volume
> (noatime). Hopefully that's helpful.
>
> Bryce
> Claudio Freire <mailto:klaussfreire(at)gmail(dot)com>
> December 12, 2013 12:15 PM
> On Thu, Dec 12, 2013 at 5:10 PM, Bryce Covert
>
>
> For this kind of diagnostic, you need to include hardware details.
>
> OS? Disks? RAID?
>
>
> Bryce Covert <mailto:bryce(at)brycecovertoperations(dot)com>
> December 12, 2013 12:10 PM
>
> Hi,
>
> I'm seeing a slow running query. After some optimization with indexes,
> it appears that the query plan is correct, it's just slow. Running the
> query twice, not surprisingly, is very fast, due to OS caching or
> shared_buffers caching. If the parameters for the query are different,
> however, the query is slow until run a second time. In our usage
> pattern, this query must run fast the first time it runs.
>
> A couple of general stats: this is a linode machine with a single 3GB
> DB with 4GBs of ram. Shared buffers = 1024mb,
> effective_cache_size=2048MB. We are running with postgres 9.1. The
> machine is otherwise dormant when this query runs. Here is the schema:
>
> Table "public.premiseaccount"
> Column | Type
> | Modifiers
> ------------------------+--------------------------+-------------------------------------------------------------------------
>
> id | integer | not null default
> nextval('premiseaccount_id_seq'::regclass)
> created | timestamp with time zone | not null default
> '2013-09-25 07:00:00+00'::timestamp with time zone
> modified | timestamp with time zone | not null default
> '2013-09-25 07:00:00+00'::timestamp with time zone
> account_id | integer | not null
> premise_id | integer | not null
> bucket | character varying(255) |
> Indexes:
> "premiseaccount_pkey" PRIMARY KEY, btree (id)
> "premiseaccount_account_id" btree (account_id)
> "premiseaccount_bucket" btree (bucket)
> "premiseaccount_bucket_58c70392619aa36f" btree (bucket, id)
> "premiseaccount_bucket_like" btree (bucket varchar_pattern_ops)
> "premiseaccount_premise_id" btree (premise_id)
> Foreign-key constraints:
> "account_id_refs_id_529631edfff28022" FOREIGN KEY (account_id)
> REFERENCES utilityaccount(id) DEFERRABLE INITIALLY DEFERRED
> "premise_id_refs_id_5ecea2842007328b" FOREIGN KEY (premise_id)
> REFERENCES premise(id) DEFERRABLE INITIALLY DEFERRED
>
> Table "public.electricusage"
> Column | Type
> | Modifiers
> --------------------+--------------------------+------------------------------------------------------------------------
>
> id | integer | not null default
> nextval('electricusage_id_seq'::regclass)
> created | timestamp with time zone | not null default
> '2013-09-25 07:00:00+00'::timestamp with time zone
> modified | timestamp with time zone | not null default
> '2013-09-25 07:00:00+00'::timestamp with time zone
> from_date | timestamp with time zone | not null
> to_date | timestamp with time zone | not null
> usage | numeric(9,2) |
> demand | numeric(9,2) |
> bill_amount | numeric(9,2) |
> premise_account_id | integer | not null
> Indexes:
> "electricusage_pkey" PRIMARY KEY, btree (id)
> "electricusage_premise_account_id" btree (premise_account_id)
> "electricusage_covered_id_from_date_usage" btree
> (premise_account_id, from_date, usage)
> Foreign-key constraints:
> "premise_account_id_refs_id_4c39e54406369128" FOREIGN KEY
> (premise_account_id) REFERENCES premiseaccount(id) DEFERRABLE
> INITIALLY DEFERRED
>
>
> For reference, premiseaccount has about 1 million rows, one for each
> account, grouped in buckets with an average of 5000 accounts per
> bucket. electricusage has 10 million rows, about 10 rows per
> premiseaccount.
>
> Here is the query:
>
> explain analyze
> SELECT premiseaccount.id, SUM(electricusage.usage) AS total_kwh
> FROM premiseaccount
> LEFT OUTER JOIN electricusage
> ON premiseaccount.id = electricusage.premise_account_id
> WHERE premiseaccount.bucket = 'XXX' AND electricusage.from_date >=
> '2012-11-20 00:00:00'
> GROUP BY premiseaccount.id
> HAVING SUM(electricusage.usage) BETWEEN 3284 and 3769
> LIMIT 50;
>
>
> Limit (cost=0.00..1987.24 rows=50 width=8) (actual
> time=931.524..203631.435 rows=50 loops=1)
> -> GroupAggregate (cost=0.00..179487.78 rows=4516 width=8)
> (actual time=931.519..203631.275 rows=50 loops=1)
> Filter: ((sum(electricusage.usage) >= 3284::numeric) AND
> (sum(electricusage.usage) <= 3769::numeric))
> -> Nested Loop (cost=0.00..179056.87 rows=36317 width=8)
> (actual time=101.934..203450.761 rows=30711 loops=1)
> -> Index Scan using
> premiseaccount_bucket_58c70392619aa36f on premiseaccount
> premiseaccount (cost=0.00..14882.30 rows=4516 width=4) (actual
> time=77.620..7199.527 rows=3437 loops=1)
> Index Cond: ((bucket)::text = 'XXX'::text)
> -> Index Scan using
> electricusage_premise_account_id_36bc8999ced10059 on electricusage
> electricusage (cost=0.00..36.24 rows=9 width=8) (actual
> time=8.607..57.055 rows=9 loops=3437)
> Index Cond: ((premise_account_id =
> premiseaccount.id) AND (from_date >= '2012-11-20
> 00:00:00+00'::timestamp with time zone))
> Total runtime: 203631.666 ms
>
> (see online at: http://explain.depesz.com/s/zeC)
>
> What am I missing here? It seems like this is a relatively
> straightforward foreign key join, using the correct indexes, that is
> just slow. Warming the OS cache seems like the only way to make this
> actually perform reasonably, but it seems like it's masking the
> underlying problem. I could probably do some denormalization, like
> putting the bucket field on the electricusage table, but that seems
> like a shoddy solution.
>
> Before running my query, I clear the os and postgres cache. I know
> that in theory some subset of the data will be in the os cache and
> postgres cache. But in the real world, what we see is that a number of
> people get timeouts waiting for this query to finish, because it's not
> in the cache. e.g., running the query with where bucket='xxx' will be
> fast a second time, but where bucket='yyy' will not.
>
>
> Here are things I've tried:
> + Clustering premiseaccount on premiseaccount_bucket_58c70392619aa36f
> and electricusage on electricusage_covered_id_from_date_usage. This
> provided modest improvement
> + Fit the whole DB in memory. Since this query is fast if the data is
> cached, try to fit everything into the cache. Bumping the ram to 8GBs,
> and warming up the os cache by taring the postgres data directory. On
> the 4GB machine, I still run into slow performance, but at 8GBs, the
> entire DB can fit in the OS cache. This made a huge improvement, but
> will only work for a little while as the database grows.
> + Tried on other linode servers - same results.
> + Removing the from_date part of the query. It uses a different index
> but same performance characteristics.
> + Looked at kern, sys, and postgres logs. Nothing interesting.
>
>
> I would really appreciate any help I could get! Thanks!
>
> Bryce
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2013-12-12 21:35:51 | Re: Slow query due to slow I/O |
Previous Message | Bryce Covert | 2013-12-12 21:16:26 | Re: Slow query due to slow I/O |