From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Jeff Trout" <threshar(at)threshar(dot)is-a-geek(dot)com>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Slow PITR restore |
Date: | 2007-12-13 06:27:44 |
Message-ID: | 874pen2jof.fsf@oxford.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>
>> Exactly. Which is the point I am making. Five minutes of transactions
>> is nothing (speaking generally).. In short, if we are in recovery, and
>> we are not saturated the I/O and at least a single CPU, there is a huge
>> amount of optimization *somewhere* to be done.
>
> You sure about that? I tested CVS HEAD just now, by setting the
> checkpoint_ parameters really high, running pgbench for awhile, and
> then killing the bgwriter to force a recovery cycle over all the WAL
> generated by the pgbench run. What I saw was that the machine was 100%
> disk write bound. Increasing shared_buffers helped, not in that the
> write rate got less according to vmstat, but the completion time did.
There are at least three definitions of "saturating the I/O" and it sounds
like you two are using two different ones.
1) The processor is waiting on I/O all the time
2) The hard drives are all always handling a request
3) The hard drives are all handling the full bandwidth they're capable
You would expect (1) and (2) to be the same for a single drive -- though in
practice there seems to be a gap even between them. But for a raid array there
can be a large difference, and the wider the raid stripe the larger the
difference.
In Tom's results:
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 0 9 70024 29232 19876 824368 0 0 0 3152 1447 233 0 1 0 99 0
> 0 9 70024 29232 19876 824368 0 0 0 3660 1474 252 0 1 0 99 0
> 0 8 70024 28960 19876 824404 0 0 0 3176 1448 265 0 2 1 97 0
>
> I don't see the machine sitting around doing nothing ...
Note that even though the processor is 99% in wait state the drive is only
handling about 3 MB/s. That translates into a seek time of 2.2ms which is
actually pretty fast. So if this is a single drive (1) and (2) seem to be
pretty much the same here.
But note that if this were a raid array Postgres's wouldn't be getting any
better results. A Raid array wouldn't improve i/o latency at all and since
it's already 99% waiting for i/o Postgres is not going to be able to issue any
more. But only one drive in the raid array will be busy at a time which would
be far less than the maximum random access i/o the raid array is capable of.
Heikki proposed a while back to use posix_fadvise() when processing logs to
read-ahead blocks which the recover will need before actually attempting to
recover them. On a raid array that would bring the 3MB/s above up to the
maximum number of random accesses the raid array can handle (ie, definition
(2) above).
That's still going to be a far cry from the maximum bandwidth the hard drive
can handle. Even that single drive can probably handle 60MB/s sequential I/O.
That's probably the source of the unrealistic expectations people have. It's
easy to watch the bandwidth number as the headline number to measure i/o
utilization.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!
From | Date | Subject | |
---|---|---|---|
Next Message | Gerald Timothy Quimpo | 2007-12-13 06:36:55 | Re: what is the date format in binary query results |
Previous Message | Tom Lane | 2007-12-13 06:25:08 | Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults) |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-12-13 06:36:33 | Re: little correction |
Previous Message | Joshua D. Drake | 2007-12-13 06:04:57 | Re: Slow PITR restore |