From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Sait Talha Nisanci <Sait(dot)Nisanci(at)microsoft(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach) |
Date: | 2020-08-29 22:14:50 |
Message-ID: | 20200829221450.t7omssadp2i6bbcx@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 27, 2020 at 04:28:54PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> On Thu, Aug 27, 2020 at 2:51 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> > > Hm? At least earlier versions didn't do prefetching for records with an fpw, and only for subsequent records affecting the same or if not in s_b anymore.
>> >
>> > We don't actually read the page when we're replaying an FPW though..?
>> > If we don't read it, and we entirely write the page from the FPW, how is
>> > pre-fetching helping..?
>>
>> Suppose there is a checkpoint. Then we replay a record with an FPW,
>> pre-fetching nothing. Then the buffer gets evicted from
>> shared_buffers, and maybe the OS cache too. Then, before the next
>> checkpoint, we again replay a record for the same page. At this point,
>> pre-fetching should be helpful.
>
>Sure- but if we're talking about 25GB of WAL, on a server that's got
>32GB, then why would those pages end up getting evicted from memory
>entirely? Particularly, enough of them to end up with such a huge
>difference in replay time..
>
>I do agree that if we've got more outstanding WAL between checkpoints
>than the system's got memory then that certainly changes things, but
>that wasn't what I understood the case to be here.
>
I don't think it's very clear how much WAL there actually was in each
case - the message only said there was more than 25GB, but who knows how
many checkpoints that covers? In the cases with FPW=on this may easily
be much less than one checkpoint (because with scale 45GB an update to
every page will log 45GB of full-page images). It'd be interesting to
see some stats from pg_waldump etc.
>> Admittedly, I don't quite understand whether that is what is happening
>> in this test case, or why SDD vs. HDD should make any difference. But
>> there doesn't seem to be any reason why it doesn't make sense in
>> theory.
>
>I agree that this could be a reason, but it doesn't seem to quite fit in
>this particular case given the amount of memory and WAL. I'm suspecting
>that it's something else and I'd very much like to know if it's a
>general "this applies to all (most? a lot of?) SSDs because the
>hardware has a larger than 8KB page size and therefore the kernel has to
>read it", or if it's something odd about this particular system and
>doesn't apply generally.
>
Not sure. I doubt it has anything to do with the hardware page size,
that's mostly transparent to the kernel anyway. But it might be that the
prefetching on a particular SSD has more overhead than what it saves.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-08-29 22:43:47 | Re: list of extended statistics on psql |
Previous Message | Tomas Vondra | 2020-08-29 21:54:58 | Re: list of extended statistics on psql |