Re: High cpu usage after many inserts

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Jordan Tomkinson <jordan(at)moodle(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: High cpu usage after many inserts
Date: 2009-02-24 09:17:10
Message-ID: dcc563d10902240117s62609f44kcd2fd3d7c2274853@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Feb 24, 2009 at 12:40 AM, Jordan Tomkinson <jordan(at)moodle(dot)com> wrote:
>
>
> On Tue, Feb 24, 2009 at 11:42 AM, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>>
>> Right, the useful thing to do in this case is to take a look at how big
>> all the relations (tables, indexes) involved are at each of the steps in the
>> process.  The script at http://wiki.postgresql.org/wiki/Disk_Usage will show
>> you that.  That will give some feedback on whether the vacuum/reindex
>> methodology is really doing what you expect, and it will also let you
>> compare the size of the table/index with how much RAM is in the system.
>>
>
> taken before the ~7000 rows were entered.
>
>            relation            |  size
> -------------------------------+--------
>  public.mdl_log                | 595 MB
>  public.mdl_forum_posts        | 375 MB
>  public.mdl_log_coumodact_ix   | 197 MB
>  public.mdl_user               | 191 MB
>  public.mdl_cache_text         | 162 MB
>  public.mdl_log_usecou_ix      | 137 MB
>  public.mdl_log_act_ix         | 119 MB
>  public.mdl_log_cmi_ix         | 97 MB
>  public.mdl_log_tim_ix         | 97 MB
>  public.mdl_log_id_pk          | 97 MB
>  public.mdl_question_states    | 48 MB
>  public.mdl_stats_user_daily   | 48 MB
>  public.mdl_hotpot_responses   | 47 MB
>  public.mdl_register_downloads | 45 MB
>  public.mdl_message_read       | 37 MB
>  public.mdl_course_display     | 37 MB
>  public.mdl_stats_user_weekly  | 31 MB
>  public.mdl_mnet_log           | 27 MB
>  public.mdl_user_ema_ix        | 26 MB
>  public.mdl_regidown_url_ix    | 23 MB

What's more interesting is how quickly they grow during your test.
I'm betting that as public.mdl_log and public.mdl_forum_posts grow,
you get a dataset larger than memory.

There are two levels of caching that pgsql uses, the highest and
closest to pgsql is the shared_buffer cache, and the next is the
kernel level file system cache. While it's still way faster to hit
the kernel level of file cache than to hit the actual hard drives, the
pg shared_buffers is the fastest. You may be in a situation where
giving a bit more memory to pg will help, but with a 4G dataset and 8G
of ram you're cutting it close. You need a few gig for sorts and
processes and such like that. Going to 16Gig you could set
shared_buffers at somewhere in the 4 to 8Gig range and it might work
out.

If you're looking at scaling to large amounts of data, you can't plan
on it all fitting into memory, and you have to start planning for
faster Disk I/O. This means more disks, fast RAID controllers with
optional battery backed cache (not really optional) and / or kernel
level RAID, for read mostly stuff it's quite fast. As expensive as 16
or 24 or 32 fast hard drives are, they're cheaper than servers with a
half terabyte of ram or whatever you'd need for a big dataset.

First things first I'd try increasing shared_buffers to the just over
4G range. I'd check after each run with vacuum verbose (NOT FULL) to
see how bloated my db was getting.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Akinde 2009-02-24 10:18:09 Re: Large object loading stalls
Previous Message Jordan Tomkinson 2009-02-24 07:40:23 Re: High cpu usage after many inserts