Re: Set of related slow queries

From: John Williams <jwilliams(at)42nddesign(dot)com>
To: sthomas(at)peak6(dot)com
Cc: tv(at)fuzzy(dot)cz, pgsql-performance(at)postgresql(dot)org
Subject: Re: Set of related slow queries
Date: 2011-06-08 16:37:09
Message-ID: BANLkTikOdLzp=MqFLgViSuR2FTQcO-GaAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi All,

Let me clarify this a bit.

The memory values are ridiculous you're completely correct. I've
since fixed that (it had no effect sadly).

I've adjust the shared buffers to about 400MB. As per the tuning
guide to set that to around 1/4 of your system memory (the AWS in
question has 1.7GB). I didn't have the shared buffers set correctly
to start because truthfully I had no idea how to incurease shmmax and
I had to look that up.

The work_mem is very very high for the system It's running on
admittedly. I'm ok with leaving that though because currently I'm the
only one on the machine at all (this isn't a production set up it's a
testing setup). Realistically it's only that high because someone
suggusted trying a much higher value (I had already personally set it
to 50MB as that was about 30% larger than the largest sort I found)
and see if that improved the situation (it didn't).

Several of the implications of my current set of data make things look
a little wrong so let me clarify the issue a bit. The table is
composed of data coming from a games combat log. Each log represents
about 400k entries. Since I only really care to look at data from the
perspective of each log, the log_id is infact going to be more most
selective portion of the query. Right now the table only has two logs
in it making this hard to see. But it should reflect that the
situation shouldn't get worse over time. I will basically never be
looking at more than a 400-500k record portion of my entries table at
a time.

This stuff gets really painful because I can't very well predict the
queries so I can't pre calculate and the data isn't like a system log,
I could be accepting uploads of 100's of such logs per day. The
actual queries that are run are a function of what the user wants to
see. Their are roughly 5 or so different data views, each of which
takes 15-25 separate queries to calculate all the various graphs and
aggregates. Frequently I won't be looking at the "overall" entire log
(composed of 400k entries), instead I'll be looking at smaller slices
of the data adding: WHERE seconds_since_start <= 1500 AND seconds
since start <= 4000 or some such with very arbitrary min and max.

Now I should say I've seen almost this exact same work done before for
a different game. So I can't help but feel I must be missing
something really important either in how I'm setting up my data or how
I'm processing.

Thanks,
John

---

John Williams
42nd Design
Email: jwilliams(at)42nddesign(dot)com
Skype: druidjaidan
Phone: (520) 440-7239

On Wed, Jun 8, 2011 at 6:36 AM, Shaun Thomas <sthomas(at)peak6(dot)com> wrote:
> On 06/08/2011 06:30 AM, tv(at)fuzzy(dot)cz wrote:
>
>>>  shared_buffers               | 16MB
>>>  work_mem                     | 250MB
>>
>> This seems a bit suspicious. Are you sure you want to keep the
>> shared_buffers so small and work_mem so large at the same time? There
>> probably are workloads where this is the right thing to do, but I doubt
>> this is the case. Why have you set it like this?
>
> I must concur in this case.  I can't imagine any scenario where this makes
> sense. Work-mem is allocated on a per-sort basis, not just per session or
> transaction. So a large query could allocate several of these and run your
> system out of memory and cause the OOM killer to start causing trouble.
>
>> I don't have much experience with running Pg on AWS, but I'd try to
>> increase the shared buffers to say 512MB and decrease the work_mem to
>> 16MB (or something like that).
>
> Easily good minimums. But it looks like your AWS only has 1GB of RAM (based
> on your effective_cache_size), so you may only want to increase it to 256MB.
> That said, reduce your work_mem to 8MB to start, and increase it in 4MB
> increments if it's still too low.
>
> With a setting of 16MB, it has to load data in and out of memory constantly.
> Even if the host OS has cached every single block you'll ever use, that's
> only the raw table contents. Processing hundreds of thousands of rows still
> takes time, you just saved yourself the effort of fetching them from disk,
> shared_buffers is still necessary to do actual work.
>
> Now... I have some issues with your queries, which are likely the fault of
> the Django ORM, but still consider your analyze:
>
>> http://explain.depesz.com/s/vEx
>
> Your bitmap index scan on logparser is hilarious. The estimates are fine.
> 237k rows in 47ms when it expected 217k. If your table really does have 815k
> rows in it, that's not very selective at all. Then it adds a heap scan for
> the remaining where conditions, and you end up with 100k rows it then has to
> sort. That's never going to be fast. 600ms actually isn't terrible for this
> many rows, and it also explains your high CPU.
>
> Then your next one:
>
>> http://explain.depesz.com/s/Rhb
>
> 700ms, mostly because of the HashAggregate caused by grouping by
> round(((seconds_since_start / 42)). You're aggregating by a calculation on
> 100k rows. Again, this will never be "fast" and 700ms is not terrible
> considering all the extra work the engine's doing. Again, your index scan
> returning everything and the kitchen sink is the root cause. Which also is
> evidenced here:
>
>> http://explain.depesz.com/s/JUo
>
> And here:
>
> http://explain.depesz.com/s/VZA
>
> Everything is being caused because it's always using the
> ogparser_entry_event_type_like index to fetch the initial 200k rows. The
> only way to make this faster is to restrict the rows coming back. For
> instance, since you know these values are coming in every day, why search
> through all of history every time?
>
> Why not get your timestamp column involved? Maybe you only need to look at
> Attack, DoT Tick, and Critical Attack event types for the last day, or week,
> or even month. That alone should drastically reduce your row counts and give
> the engine a much smaller data set to aggregate and sort.
>
> The thing is, the way your queries are currently written, as you get more
> data, this is just going to get worse and worse. Grabbing a quarter of a
> table that just gets bigger every day and then getting aggregates (group by,
> etc) is going to get slower every day unless you can restrict the result set
> with more where clauses. If you need reports on a lot of this data on a
> regular basis, consider running a nightly or hourly batch to insert them
> into a reporting table you can check later.
>
> There's a lot you can do here.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> sthomas(at)peak6(dot)com
>
> ______________________________________________
>
> See  http://www.peak6.com/email_disclaimer.php
> for terms and conditions related to this email
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-06-08 17:03:49 Re: Oracle v. Postgres 9.0 query performance
Previous Message Tony Capobianco 2011-06-08 16:33:43 Re: Oracle v. Postgres 9.0 query performance