Re: Process local hint bit cache

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Process local hint bit cache
Date: 2011-03-30 19:35:42
Message-ID: AANLkTikpDH5b9=QogCxL+3H=rqc+sMXC3F02VvWp33Yx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 30, 2011 at 11:23 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 30.03.2011 18:02, Robert Haas wrote:
>>
>> On Wed, Mar 30, 2011 at 10:40 AM, Greg Stark<gsstark(at)mit(dot)edu>  wrote:
>>>
>>> But one way or another the hint bits have to get set sometime. The
>>> sooner that happens the less clog i/o has to happen in the meantime.
>>
>> I talked about this with Merlin a bit yesterday.  I think that his
>> thought is that most transactions will access a small enough number of
>> distinct CLOG pages, and the cache accesses might be fast enough, that
>> we really wouldn't need to get the hint bits set, or at least that
>> vacuum/freeze time would be soon enough.  I'm not sure if that's
>> actually true, though - I think the overhead of the cache might be
>> higher than he's imagining.  However, there's a sure-fire way to find
>> out... code it up and see how it plays.
>
> I did a little experiment: I hacked SetHintBits() to do nothing, so that
> hint bits are never set. I then created a table with 100000 rows in it:
>
> postgres=# \d foo
>      Table "public.foo"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  a      | integer |
>
> postgres=# INSERT INTO foo SELECT generate_series(1, 100000);
> INSERT 0 100000
>
> And ran queries on the table:
>
> postgres=# do $$
> declare
>  i int4;
> begin
>  loop
>    perform COUNT(*) FROM foo;
>  end loop;
> end;
> $$;
>
> This test case is designed to exercise the visibility tests as much as
> possible. However, all the tuples are loaded in one transaction, so the
> one-element cache in TransactionLogFetch is 100% effective.
>
> I ran oprofile on that. It shows that about 15% of the time is spent in
> HeapTupleSatisfiesMVCC and its subroutines. 6.6% is spent in
> HeapTupleSatisfiesMVCC itself. Here's the breakdown of that:
>
> $ opreport  -c --global-percent
>
> CPU: Intel Architectural Perfmon, speed 2266 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 100000
> samples  %        app name                 symbol name
> ...
> -------------------------------------------------------------------------------
>  2143      0.4419  postgres                 postgres heapgettup_pagemode
>  73277    15.1091  postgres                 postgres heapgetpage
> 31858 6.5688  postgres                 postgres HeapTupleSatisfiesMVCC
>  31858 6.5688  postgres                 postgres HeapTupleSatisfiesMVCC
> [self]
>  12809     2.6411  postgres                 postgres
> TransactionIdIsInProgress
>  12091     2.4931  postgres                 postgres XidInMVCCSnapshot
>  7150      1.4743  postgres                 postgres
> TransactionIdIsCurrentTransactionId
>  7056      1.4549  postgres                 postgres TransactionIdDidCommit
>  1839      0.3792  postgres                 postgres TransactionIdPrecedes
>  1467      0.3025  postgres                 postgres SetHintBits
>  1155      0.2382  postgres                 postgres TransactionLogFetch
> -------------------------------------------------------------------------------
> ...
>
> I then ran the same test again with an unpatched version, to set the hint
> bits. After the hint bits were set, I ran oprofile again:
>
> -------------------------------------------------------------------------------
>  275       0.4986  postgres                 heapgettup_pagemode
>  4459      8.0851  postgres                 heapgetpage
> 3005      5.4487  postgres                 HeapTupleSatisfiesMVCC
>  3005      5.4487  postgres                 HeapTupleSatisfiesMVCC [self]
>  1620      2.9374  postgres                 XidInMVCCSnapshot
>  110       0.1995  postgres                 TransactionIdPrecedes
> -------------------------------------------------------------------------------
>
> So with hint bits set, HeapTupleSatisfiesMVCC accounts for 8% of the total
> CPU time, and without hint bits, 15%.
>
> Even if clog access was free, hint bits still give a significant speedup
> thanks to skipping all the other overhead like TransactionIdIsInProgress()
> and TransactionIdIsCurrentTransactionId(). Speeding up clog access is
> important; when the one-element cache isn't saving you the clog access
> becomes a dominant factor. But you have to address that other overhead too
> before you can get rid of hint bits.

Here is a patch demonstrating the caching action, but without the
cache table, which isn't done yet -- It only works using the very last
transaction id fetched. I used macros so I could keep the changes
quite localized.

The payoff is obvious:

stock postgres:
postgres=# create table v as select generate_series(1,50000000) v;
select count(*) from v;
SELECT 50000000
Time: 70010.160 ms

select count(*) from v;
run 1: 64.5 seconds <-- !
run 2: 21.3 seconds
run 3: 19.3 seconds

hint bit patch:
run 1: 19.2 seconds <-- the 'money shot'
run 2: 20.7 seconds
run 3: 19.3 seconds

Of course, until I get the cache table mechanism finished, you only
see real benefit if you have significant # of pages all the same
transaction. otoh, checking the last transaction has cost of 0, so
your worst case performance is the old behavior. I'm pretty sure I
can make a cache that is cheap to check and maintain because I'm
completely free from locks, side effects, etc.

btw I haven't forgotten your idea to move TransactionIdInProgress
Down. I think this is a good idea, and will experiment with it pre and
post cache.

merlin

Attachment Content-Type Size
hbcache.patch application/octet-stream 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-03-30 19:39:05 Re: Problem with pg_upgrade?
Previous Message Robert Haas 2011-03-30 19:32:26 Re: pg_dump --binary-upgrade vs. ALTER TYPE ... DROP ATTRIBUTE