From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WALInsertLock tuning |
Date: | 2011-06-07 16:12:27 |
Message-ID: | BANLkTik3a60e6cE+1B5UU6gLXotL8-7d+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 7, 2011 at 4:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 07.06.2011 10:55, Simon Riggs wrote:
>>> How would that help?
>
>> It doesn't matter whether the pages are zeroed while they sit in memory.
>> And if you write a full page of WAL data, any wasted bytes at the end of
>> the page don't matter, because they're ignored at replay anyway. The
>> possibility of mistaking random garbage for valid WAL only occurs when
>> we write a partial WAL page to disk. So, it is enough to zero the
>> remainder of the partial WAL page (or just the next few words) when we
>> write it out.
>
>> That's a lot cheaper than fully zeroing every page. (except for the fact
>> that you'd need to hold WALInsertLock while you do it)
>
> I think avoiding the need to hold both locks at once is probably exactly
> why the zeroing was done where it is.
>
> An interesting alternative is to have XLogInsert itself just plop down a
> few more zeroes immediately after the record it's inserted, before it
> releases WALInsertLock. This will be redundant work once the next
> record gets added, but it's cheap enough to not matter IMO. As was
> mentioned upthread, zeroing out the bytes that will eventually hold the
> next record's xl_prev field ought to be enough to maintain a guarantee
> that we won't believe the next record is valid.
Lets see what the overheads are with a continuous stream of short WAL
records, say xl_heap_delete records.
xl header is 32 bytes, xl_heap_delete is 24 bytes.
So there would be ~145 records per page. 12 byte zeroing overhead per
record gives 1740 total zero bytes written per page.
The overhead is at worst case less than 25% of current overhead, plus
its spread out across multiple records.
When we get lots of full pages into WAL just after checkpoint we don't
get as much overhead - nearly every full page forces a page switch. So
we're removing overhead from where it hurts the most and amortising
across other records.
Maths work for me.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-06-07 16:26:34 | Re: Postmaster holding unlinked files for pg_largeobject table |
Previous Message | Tom Lane | 2011-06-07 15:57:37 | Re: WALInsertLock tuning |