Re: [rfc] overhauling pgstat.stat

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [rfc] overhauling pgstat.stat
Date: 2013-09-04 18:42:52
Message-ID: 52277F2C.508@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4.9.2013 14:43, Pavel Stehule wrote:
>
>
>
> 2013/9/4 Atri Sharma <atri(dot)jiit(at)gmail(dot)com <mailto:atri(dot)jiit(at)gmail(dot)com>>
>
>
>
> Sent from my iPad
>
> On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga(at)uptime(dot)jp
> <mailto:snaga(at)uptime(dot)jp>> wrote:
>
> > (2013/09/04 15:23), Atri Sharma wrote:
> >>
> >>
> >> Sent from my iPad
> >>
> >> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga(at)uptime(dot)jp
> <mailto:snaga(at)uptime(dot)jp>> wrote:
> >>
> >>> Hi,
> >>>
> >>> (2013/09/04 12:52), Atri Sharma wrote:
> >>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu
> <snaga(at)uptime(dot)jp <mailto:snaga(at)uptime(dot)jp>> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I'm considering overhauling pgstat.stat, and would like to
> know how many
> >>>>> people are interested in this topic.
> >>>>>
> >>>>> As you may know, this file could be handreds of MB in size,
> because
> >>>>> pgstat.stat holds all access statistics in each database, and
> it needs
> >>>>> to read/write an entire pgstat.stat frequently.
> >>>>>
> >>>>> As a result, pgstat.stat often generates massive I/O operation,
> >>>>> particularly when having a large number of tables in the database.
> >>>>>
> >>>>> To support multi-tenancy or just a large number of tables (up
> to 10k
> >>>>> tables in single database), I think pgstat.stat needs to be
> overhauled.
> >>>>>
> >>>>> I think using heap and btree in pgstat.stat would be preferred
> to reduce
> >>>>> read/write and to allow updating access statistics for
> specific tables
> >>>>> in pgstat.stat file.
> >>>>>
> >>>>> Is this good for us?
> >>>>
> >>>> Hi,
> >>>>
> >>>> Nice thought. I/O reduction in pgstat can be really helpful.
> >>>>
> >>>> I am trying to think of our aim here. Would we be looking to split
> >>>> pgstat per table, so that the I/O write happens for only a
> portion of
> >>>> pgstat? Or reduce the I/O in general?
> >>>
> >>> I prefer the latter.
> >>>
> >>> Under the current implementation, DBA need to split single database
> >>> into many smaller databases with considering access locality of the
> >>> tables. It's difficult and could be change in future.
> >>>
> >>> And splitting the statistics data into many files (per table,
> >>> for example) would cause another performance issue when
> >>> collecting/showing statistics at once. Just my guess though.
> >>>
> >>> So, I'm looking for a new way to reduce I/O for the statistics data
> >>> in general.
> >>>
> >>> Regards,
> >>>
> >>>>
> >>>> If the later, how would using BTree help us? I would rather go
> for a
> >>>> range tree or something. But again, I may be completely wrong.
> >>>>
> >>>> Please elaborate a bit more on the solution we are trying to
> >>>> achieve.It seems really interesting.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Atri
> >>
> >> Right,thanks.
> >>
> >> How would using heap and BTree help here? Are we looking at a
> priority queue which supports the main storage system of the stats?
> >
> > For example, when you read only a single block from your table,
> > then you need to write all values in your database statistics next.
> > It often generates large amount of i/o operation.
> >
> > However, if random access is allowed in the statistics, you can
> > update only as single record for the specific table which you read.
> > It would be less than 100 bytes for each table.
> >
> > I have no idea about how a priority queue can work here so far.
> > However, if the statistics is overhauled, PostgreSQL would be able
> > to host a much larger number of customers
>
>
> Ah, now I get it. Thanks a ton for the detailed explanation.
>
> Yes, a BTree will sufficiently isolate per table stats here and
> allow for random access.
>
> Another thing I can think of is having a write back cache which
> could probably be used for a buffer before the actual stats write. I
> am just musing here though.
>
>
> we very successfully use a tmpfs volume for pgstat files (use a backport
> of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-09-04 18:50:45 Re: [rfc] overhauling pgstat.stat
Previous Message Tomas Vondra 2013-09-04 18:34:09 Re: [rfc] overhauling pgstat.stat