From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Process local hint bit cache |
Date: | 2011-03-31 22:38:33 |
Message-ID: | AANLkTimpUbs=bCHGN9oCV9oug55b5DUw4DaCuAXwXivM@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 31, 2011 at 5:33 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Wed, Mar 30, 2011 at 2:35 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Wed, Mar 30, 2011 at 11:23 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> On 30.03.2011 18:02, Robert Haas wrote:
>>>>
>>>> On Wed, Mar 30, 2011 at 10:40 AM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
>>>>>
>>>>> But one way or another the hint bits have to get set sometime. The
>>>>> sooner that happens the less clog i/o has to happen in the meantime.
>>>>
>>>> I talked about this with Merlin a bit yesterday. I think that his
>>>> thought is that most transactions will access a small enough number of
>>>> distinct CLOG pages, and the cache accesses might be fast enough, that
>>>> we really wouldn't need to get the hint bits set, or at least that
>>>> vacuum/freeze time would be soon enough. I'm not sure if that's
>>>> actually true, though - I think the overhead of the cache might be
>>>> higher than he's imagining. However, there's a sure-fire way to find
>>>> out... code it up and see how it plays.
>>>
>>> I did a little experiment: I hacked SetHintBits() to do nothing, so that
>>> hint bits are never set. I then created a table with 100000 rows in it:
>>>
>>> postgres=# \d foo
>>> Table "public.foo"
>>> Column | Type | Modifiers
>>> --------+---------+-----------
>>> a | integer |
>>>
>>> postgres=# INSERT INTO foo SELECT generate_series(1, 100000);
>>> INSERT 0 100000
>>>
>>> And ran queries on the table:
>>>
>>> postgres=# do $$
>>> declare
>>> i int4;
>>> begin
>>> loop
>>> perform COUNT(*) FROM foo;
>>> end loop;
>>> end;
>>> $$;
>>>
>>> This test case is designed to exercise the visibility tests as much as
>>> possible. However, all the tuples are loaded in one transaction, so the
>>> one-element cache in TransactionLogFetch is 100% effective.
>>>
>>> I ran oprofile on that. It shows that about 15% of the time is spent in
>>> HeapTupleSatisfiesMVCC and its subroutines. 6.6% is spent in
>>> HeapTupleSatisfiesMVCC itself. Here's the breakdown of that:
>>>
>>> $ opreport -c --global-percent
>>>
>>> CPU: Intel Architectural Perfmon, speed 2266 MHz (estimated)
>>> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
>>> mask of 0x00 (No unit mask) count 100000
>>> samples % app name symbol name
>>> ...
>>> -------------------------------------------------------------------------------
>>> 2143 0.4419 postgres postgres heapgettup_pagemode
>>> 73277 15.1091 postgres postgres heapgetpage
>>> 31858 6.5688 postgres postgres HeapTupleSatisfiesMVCC
>>> 31858 6.5688 postgres postgres HeapTupleSatisfiesMVCC
>>> [self]
>>> 12809 2.6411 postgres postgres
>>> TransactionIdIsInProgress
>>> 12091 2.4931 postgres postgres XidInMVCCSnapshot
>>> 7150 1.4743 postgres postgres
>>> TransactionIdIsCurrentTransactionId
>>> 7056 1.4549 postgres postgres TransactionIdDidCommit
>>> 1839 0.3792 postgres postgres TransactionIdPrecedes
>>> 1467 0.3025 postgres postgres SetHintBits
>>> 1155 0.2382 postgres postgres TransactionLogFetch
>>> -------------------------------------------------------------------------------
>>> ...
>>>
>>> I then ran the same test again with an unpatched version, to set the hint
>>> bits. After the hint bits were set, I ran oprofile again:
>>>
>>> -------------------------------------------------------------------------------
>>> 275 0.4986 postgres heapgettup_pagemode
>>> 4459 8.0851 postgres heapgetpage
>>> 3005 5.4487 postgres HeapTupleSatisfiesMVCC
>>> 3005 5.4487 postgres HeapTupleSatisfiesMVCC [self]
>>> 1620 2.9374 postgres XidInMVCCSnapshot
>>> 110 0.1995 postgres TransactionIdPrecedes
>>> -------------------------------------------------------------------------------
>>>
>>> So with hint bits set, HeapTupleSatisfiesMVCC accounts for 8% of the total
>>> CPU time, and without hint bits, 15%.
>>>
>>> Even if clog access was free, hint bits still give a significant speedup
>>> thanks to skipping all the other overhead like TransactionIdIsInProgress()
>>> and TransactionIdIsCurrentTransactionId(). Speeding up clog access is
>>> important; when the one-element cache isn't saving you the clog access
>>> becomes a dominant factor. But you have to address that other overhead too
>>> before you can get rid of hint bits.
>>
>> Here is a patch demonstrating the caching action, but without the
>> cache table, which isn't done yet -- It only works using the very last
>> transaction id fetched. I used macros so I could keep the changes
>> quite localized.
>
> While playing with the data structure to implement an xid cache inside
> TransactionLogFetch, I learned a nasty lesson.
> HeapTupleSatisifiesMVCC is super sensitive to any changes and even
> jumping through a couple of calls to TransactionLogFetch was
> measurable in a couple of performance test case I came up with. I
> hauntingly recalled Tom's warnings to beware unexpected performance
> downsides.
>
> In short, irregardless of implementation, I unfortunately don't think
> any caching as or under the clog abstraction is going to work without
> impacting at least some cases.
>
> OTOH, maybe some micro optimization of HeapTupleSatisifiesMVCC could
> provide the same benefit I'm going after here. If you save off the
> the last xid that came back committed in a static variable, you can
> check that at the same level as if it were a hint bit. From my
> testing, this is well and truly 'free' in the sense that it can
> defer/eliminate clog checks and hint bit i/o.
>
> The patch attached gives all the advantages I was after in the
hm, this isn't quite right -- if you get a 'last xid cache hit', the
hint bit tuples should be set if they are not already (but the page
should not be dirtied).
working on exanding the cache to # xid > 1.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2011-03-31 22:39:24 | Re: cast from integer to money |
Previous Message | Merlin Moncure | 2011-03-31 22:33:02 | Re: Process local hint bit cache |