Quick Links

Re: [rfc] overhauling pgstat.stat

From:	Tomas Vondra <tv(at)fuzzy(dot)cz>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [rfc] overhauling pgstat.stat
Date:	2013-09-08 23:19:31
Message-ID:	522D0603.4040407@fuzzy.cz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 8.9.2013 23:04, Jeff Janes wrote:
> On Tue, Sep 3, 2013 at 10:09 PM, Satoshi Nagayasu <snaga(at)uptime(dot)jp>
> wrote:
>> Hi,
>>
>>
>> (2013/09/04 13:07), Alvaro Herrera wrote:
>>>
>>> Satoshi Nagayasu wrote:
>>>
>>>> As you may know, this file could be handreds of MB in size,
>>>> because pgstat.stat holds all access statistics in each
>>>> database, and it needs to read/write an entire pgstat.stat
>>>> frequently.
>>>>
>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>> particularly when having a large number of tables in the
>>>> database.
>>>
>>>
>>> We already changed it:
>>
>>>
>>> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 Author: Alvaro
>>> Herrera <alvherre(at)alvh(dot)no-ip(dot)org> Date: Mon Feb 18 17:56:08
>>> 2013 -0300
>>>
>>> Split pgstat file in smaller pieces
>>
>> Thanks for the comments. I forgot to mention that.
>>
>> Yes, we have already split single pgstat.stat file into several
>> pieces.
>>
>> However, we still need to read/write large amount of statistics
>> data when we have a large number of tables in single database or
>> multiple databases being accessed. Right?
>
> Do you have a test case for measuring this? I vaguely remember from
> when I was testing the split patch, that I thought that after that
> improvement the load that was left was so low that there was little
> point in optimizing it further.

This is actually a pretty good point. Creating a synthetic test case is
quite simple - just create 1.000.000 tables in a single database, but
I'm wondering if it's actually realistic. Do we have a real-world
example where the current "one stat file per db" is not enough?

The reason why I worked on the split patch is that our application is
slightly crazy and creates a lot of tables (+ indexes) on the fly, and
as we have up to a thousand databases on each host, we often ended up
with a huge stat file.

Splitting the stat file improved that considerably, although that's
partially because we have the stats on a tmpfs, so I/O is not a problem,
and the CPU overhead is negligible thanks to splitting the stats per
database.

But AFAIK there are operating systems where creating a filesystem in RAM
is not that simple - e.g. Windows. In such cases even a moderate number
of objects may be a significant issue I/O-wise. But then again, I can't
really think of reasonable a system creating that many objects in a
single database (except for e.g. a shared database using schemas instead
of databases).

Tomas

In response to

Re: [rfc] overhauling pgstat.stat at 2013-09-08 21:04:19 from Jeff Janes

Responses

Re: [rfc] overhauling pgstat.stat at 2013-09-09 07:37:54 from Satoshi Nagayasu

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2013-09-09 03:49:03	Re: [PERFORM] encouraging index-only scans
Previous Message	Jeff Janes	2013-09-08 21:05:00	Re: [PERFORM] encouraging index-only scans