Re: xid wrap / optimize frozen tables?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Nils Goroll <slink(at)schokola(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wrap / optimize frozen tables?
Date: 2015-06-03 21:49:46
Message-ID: CAMkU=1yZu1hsxNNO3Xzuom5h+tXps5eHrDX081wiqeRuf+kMuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On May 24, 2015 6:42 AM, "Nils Goroll" <slink(at)schokola(dot)de> wrote:
>
> Hi Jeff and all,
>
> On 23/05/15 22:13, Jeff Janes wrote:
> > Are you sure it is the read IO that causes the problem?
>
> Yes. Trouble is here that we are talking about a 361 GB table
>
> List of relations
> Schema | Name | Type | Owner | Size |
> Description
>
--------+-----------------------------+----------+----------+------------+-------------
> public | *redacted*_y2015m04 | table | postgres | 361 GB |
>
> and while we have
>
> shared_buffers = 325GB
> huge_pages = on

As mentioned, that is very large setting for share buffers.

>
> this is not the only table of this size (total db size ist 1.8tb) and more
> current data got written to *redacted*_y2015m05 (the manually-partitioned
table
> for may), so most of the m04 data would have got evicted from the cache
when
> this issue surfaced initially.
>
> There is one application pushing data (bulk inserts) and we have
transaction
> rates for this app in a log. The moment the vacuum started, these rates
dropped.
> Unfortunately I cannot present helpful log excerpts here as the
autovacuum never
> finished so far (because the admin killed the db), so we have zero
logging about
> past autovac events.

Could you do an experiment in which you do a large sequential read on the
database files and measure the impact on the queries that way? Like:

tar -cf - data_dir | wc -c

Or better, use some fancy version that throttles to the read rate observed
below.

> At the moment, the application is shut down and the machine is only
running the
> vacs:
>
> query_start | 2015-05-22 19:33:52.44334+02
> waiting | f
> query | autovacuum: VACUUM public.*redacted*_y2015m04 (to
prevent
> wraparound)
> query_start | 2015-05-22 19:34:02.46004+02
> waiting | f
> query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05
(to
> prevent wraparound)
>
> so we know that any io must be caused by the vacs:
>
> shell# uptime
> 13:33:33 up 1 day, 18:01, 2 users, load average: 5.75, 12.71, 8.43

What OS is this? This load average looks very high. Does the OS charge
processes that are blocked on IO against uptime?

> shell# zpool iostat
> capacity operations bandwidth
> pool alloc free read write read write
> --------------- ----- ----- ----- ----- ----- -----
> tank1 358G 6.90T 872 55 15.1M 3.08M

I'm not familiar with zpool but this shows a lot of writing going on. If
the table was already frozen and just needed to be observed as being all
frozen, then it should not be dirtying one block for every 5 blocks read.

I would not be surprised if it were the reading, not the writing, which
caused the performance problem.

Cheers, Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-06-03 22:05:44 Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Previous Message Jim Nasby 2015-06-03 21:17:09 Re: [PATCH] Document that directly callable functions may use fn_extra