From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | lars <lhofhansl(at)yahoo(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Ivan Voras <ivoras(at)freebsd(dot)org>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: UPDATEDs slowing SELECTs in a fully cached database |
Date: | 2011-07-27 14:15:52 |
Message-ID: | CABOikdPXPfY8C46YBnb0CbBJ80af4FOfD47KEJzS-=EyHA=u-Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, Jul 13, 2011 at 10:52 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Tue, Jul 12, 2011 at 6:15 PM, lars <lhofhansl(at)yahoo(dot)com> wrote:
>> Back to the first case, here's an strace from the backend doing the select
>> right after the updates.
>> write(13,
>> "f\320\1\0\1\0\0\0\273\0\0\0\0\340\27\22`\32\0\00000002833!000"..., 2400256)
>> = 2400256
>
> On Wed, Jul 13, 2011 at 9:46 AM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Code comments indicate that they expect the pruning to be a pretty
>> clear win on multiple reads, although I don't know how much that was
>> benchmarked. Jeff does raise a good point, though -- it seems odd
>> that WAL-logging of this pruning would need to be synchronous. We
>> support asynchronous commits -- why not use that feature
>
> Right -- here are my thoughts. notice the above is writing out 293
> pages. this is suggesting to me that Kevin is right and you've
> identified a pattern where you are aggravating the page cleanup
> facilities of HOT. What threw me off here (and perhaps bears some
> additional investigation) is that early on in the report you were
> claiming an update to an indexed field which effectively disables HOT.
There are couple of other (very important) things that HOT does, but
probably its not advertised a lot. Even for non-HOT updates (which
means either indexed columns were changed or page ran out of free
space) or deletes, HOT prunes those tuples and instead mark the line
pointer as DEAD. The page is defragmented and dead space is recovered.
Each such dead tuple now only consumes two bytes in the page until
vacuum removes the dead line pointers. Thats the reason why OP is
seeing the behavior even when index columns are being updated.
We made a few adjustments to ensure that a page is not pruned too
early. So we track the oldest XID that did any updates/deletes to the
page and attempt pruning only when the RecentXmin is past the XID. We
also mark the page as "full" if some previous update did not find
enough free space to do in-block update and use that hint to decide if
we should attempt to prune the page. Finally, we prune only if we get
the cleanup lock without blocking.
What might be worth looking at this condition in pruneheap.c:
/*
* We prune when a previous UPDATE failed to find enough space on the page
* for a new tuple version, or when free space falls below the relation's
* fill-factor target (but not less than 10%).
*
* Checking free space here is questionable since we aren't holding any
* lock on the buffer; in the worst case we could get a bogus answer. It's
* unlikely to be *seriously* wrong, though, since reading either pd_lower
* or pd_upper is probably atomic. Avoiding taking a lock seems more
* important than sometimes getting a wrong answer in what is after all
* just a heuristic estimate.
*/
minfree = RelationGetTargetPageFreeSpace(relation,
HEAP_DEFAULT_FILLFACTOR);
minfree = Max(minfree, BLCKSZ / 10);
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
So if the free space in a page falls below the fill-factor or 10% of
the block size, we would try to prune the page. We probably need to
revisit this area and see if we need to tune HOT ever better. One
option could be to see how much space we are going to free and carry
out the operation only if its significant enough to justify the cost.
I know we had done several benchmarking tests while HOT development,
but the tuning mechanism still may not be perfect for all kinds of
work loads and it would probably never be.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Terry Schmitt | 2011-07-27 16:02:53 | Re: Hardware advice for scalable warehouse db |
Previous Message | shailesh singh | 2011-07-27 09:11:09 | issue related to logging facility of postgres |