From: | Hannu Krosing <hannu(at)skype(dot)net> |
---|---|
To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jeff Trout <threshar(at)threshar(dot)is-a-geek(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Slow PITR restore |
Date: | 2007-12-14 12:32:07 |
Message-ID: | 1197635527.7974.14.camel@hannu-laptop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Ühel kenal päeval, N, 2007-12-13 kell 20:25, kirjutas Heikki
Linnakangas:
...
> Hmm. That assumes that nothing else than the WAL replay will read
> pages into shared buffers. I guess that's true at the moment, but it
> doesn't seem impossible that something like Florian's read-only queries
> on a stand by server would change that.
>
> > I think that is better than both methods mentioned, and definitely
> > simpler than my brute-force method. It also lends itself to using both
> > previously mentioned methods as additional techniques if we really
> > needed to. I suspect reordering the I/Os in this way is going to make a
> > huge difference to cache hit rates.
>
> But it won't actually do anything to scale the I/O. You're still going
> to be issuing only one read request at a time. The order of those
> requests will be better from cache hit point of view, which is good, but
> the problem remains that if the modified data blocks are scattered
> around the database, you'll be doing random I/O, one request at a time.
Why one-at-a-time ?
You could have a long list of pages need to read in, and ask for them
all at the same time.
Here's what I mean
1 ) allocate buffers for N database pages, and a queue for N wal records
2 ) read N wal records to wal record queue, assign database page numbers
from these to buffer pages and issue posix_fadvise() for all as you go.
2a ) if there were repeated pages and thus there are free buffers,
allocate queu items and read some more wal records and assign buffer and
fadvise until N fubbers used
3) process wal record queue to buffers read in by 2
4) write the buffers back to disk
repeat from 2), freeing LRU buffers
Here reads in 2) will be optimised by system via posix_fadvise, and also
the caches can be split between multiple workers by page number hash or
some other random/uniform means to use more than one CPU
-------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Ashish Karalkar | 2007-12-14 12:41:29 | Re: Planner ignoring to use INDEX SCAN |
Previous Message | Vincenzo Romano | 2007-12-14 12:29:04 | Anomalia file FILBD.TXT |
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2007-12-14 12:48:50 | Re: VLDB Features |
Previous Message | Hannu Krosing | 2007-12-14 12:15:17 | Re: [GENERAL] Slow PITR restore |