Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Lonni J Friedman <netllama(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andy Colson <andy(at)squeakycode(dot)net>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1
Date: 2012-05-23 19:36:44
Message-ID: 4FBD3C4C.2010409@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 24/05/12 05:09, Lonni J Friedman wrote:
> On Wed, May 23, 2012 at 9:37 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Lonni J Friedman<netllama(at)gmail(dot)com> writes:
>>> After banging my head on the wall for a long time, I happened to
>>> notice that khugepaged was consuming 100% CPU every time autovacuum
>>> was running. I did:
>>> echo "madvise"> /sys/kernel/mm/transparent_hugepage/defrag
>>> and immediately the entire problem went away.
>> Fascinating.
> In hindsight, sure. Before that, it was 2 days of horror.
>
>>> So this looks like a nasty Fedora16 kernel bug to me, or maybe
>>> postgresql& Fedora16's default kernel settings are just not
>>> compatible?
>> I agree, kernel bug. What kernel version are you using exactly?
> I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.
>
>>> Is anyone else using Fedora16& PostgreSQL-9.1 ?
>> I use an F16 box daily, but can't claim to have done major performance
>> testing with it. Can you put together a summary of your nondefault
>> Postgres settings? I wonder whether it only kicks in for a certain
>> size of shared memory for instance.
> Oh yea, I'm quite certain that this is somehow related to my setup,
> and not a generic problem with all F16/pgsql systems. For starters,
> this problem isn't happening on any of the 3 standby systems, which
> are all otherwise identical to the master in every respect. Also when
> we had done some testing (prior to the upgrades), we never ran into
> any of these problems. However our test environment was on smaller
> scale hardware, with a much smaller number of clients (and overall
> load).
>
> Here are the non default settings in postgresql.conf :
> wal_level = hot_standby
> archive_mode = on
> archive_timeout = 61
> max_wal_senders = 10
> wal_keep_segments = 5000
> hot_standby = on
> log_autovacuum_min_duration = 2500
> autovacuum_max_workers = 4
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.7
> effective_cache_size = 88GB
> work_mem = 576MB
> wal_buffers = 16MB
> checkpoint_segments = 64
> shared_buffers = 8GB
> max_connections = 350
>
> Let me know if you have any other questions. I'd be happy to provide
> as much information as possible if it can aid in fixing this bug.
>

I think they will need details of things like: RAM, number/type
processors, number & type
of disks, disk controllers & any other hardware specs that might be
relevant etc.- at very
least: total RAM & number of spindles

Also anything else running on the box.

Plus transaction load pattern - over time and read/write ratios.

type/nature of queries

size of heavily accessed tables and their indexes

Apologies, if you have already supplied the above!

Take the above with a grain of salt, I have read about performance
issues in databases,
but never had to chase issues myself. Apart from once where I reduced
the query time
of one particular query down from 30+ minutes to about 10 seconds PROGRESS
4GL/Oracle db (not relevant here)!

Cheers,
Gavin

(who has the luxury of designing a very sophisticated db with very low
transaction rates,
so doesn't need to worry about performance issues)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mark Dilger 2012-05-23 20:04:14 Re: FATAL: lock file "postmaster.pid" already exists
Previous Message Tom Lane 2012-05-23 19:23:42 Re: FATAL: lock file "postmaster.pid" already exists