From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Wong <markw(at)osdl(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Separate BLCKSZ for data and logging |
Date: | 2006-03-16 20:51:54 |
Message-ID: | 1142542314.3859.534.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2006-03-16 at 12:22 -0800, Mark Wong wrote:
> I was hoping that in the case where 2 or more data blocks are written to
> the log that they could written once within a single larger log block.
> The log block size must be larger than the data block size, of course.
I think Tom's right... the OS blocksize is smaller than BLCKSZ, so
reducing the size might help with a very high transaction load when
commits are required very frequently. At checkpoint it sounds like we
might benefit from a large WAL blocksize because of all the additional
blocks written, but we often write more than one block at a time anyway,
and that still translates to multiple OS blocks whichever way you cut
it, so I'm not convinced yet.
On Thu, 2006-03-16 at 15:21 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Overall, the two things are fairly separate, apart from the fact that we
> > do currently log whole data blocks straight to the log. Usually just
> > one, but possibly 2 or three. So I have a feeling that things would
> > become less efficient if you did this, not more.
>
> > But its a good line of thought and I'll have a look at that.
>
> I too think reducing the size of WAL blocks might be a win, because
> we currently always write whole blocks, and so a series of small
> transactions will be rewriting the same 8K block multiple times.
> If the filesystem's native block size is less than 8K, matching that
> size should theoretically make things faster.
Might it be possible to do this: When committing, if the current WAL
page is less than half-full wait for a single spin-lock cycle and then
do the write? (With the spin-lock, I mean on a single CPU we wait zero,
on a multi-CPU we wait a while). This is effectively a modification of
the group commit idea, but not to wait every time - only when it is
write-efficient to do so. (And we'd make that optional, too). We could
then ditch the remnant of the group-commit code.
Best Regards, Simon Riggs
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-03-16 21:20:39 | Re: qsort, once again |
Previous Message | Jonah H. Harris | 2006-03-16 20:32:18 | Re: qsort, once again |