Re: PATCH: pgbench - merging transaction logs

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: pgbench - merging transaction logs
Date: 2015-03-28 22:53:29
Message-ID: 551730E9.108@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28.3.2015 11:21, Fabien COELHO wrote:
>
> Hello Tomas,
>
>> I do agree that fprintf is not cheap, actually when profiling pgbench
>> it's often the #1 item, but the impact on the measurements is actually
>> quite small. For example with a small database (scale 10) and read-only
>> 30-second runs (single client), I get this:
>>
>> no logging: 18672 18792 18667 18518 18613 18547
>> with logging: 18170 18093 18162 18273 18307 18234
>>
>> So on average, that's 18634 vs. 18206, i.e. less than 2.5% difference.
>> And with more expensive transactions (larger scale, writes, ...) the
>> difference will be much smaller.
>
> I did some testing with a scale 10 prepared "SELECT only" 200 seconds
> plenty of runs with local socket connections on the largest host I have
> available:
>
> pgbench -P 10 -T 200 -S -M prepared -j $c -c $c ...
>
> I think that this cpu-bound bench is kind of a worst possible case for
> the detailed per transaction log.
>
> I also implemented a quick and dirty version for a merge log based on
> sharing a file handle (append mode + sprintf + fputs).
>
> The results are as follow:
>
> * 1 thread 33 runs median tps (average is consistent):
> - no logging: 22062
> - separate logging: 19360 (-12.2%)
> - merged logging: 19326 (-12.4%, not significant from previous)
>

Interesting. What hardware is this? I wouldn't be surprised by this
behavior on a multi-socket system, where the cross-cpu transfers are
likely more expensive on the single-socket I used (sorry, forgot to
mention what hardware I used).

> Note that the impact of logging is much larger than with your tests.
> The underlying fprintf comes from gnu libc 2.19.

I'm running glibc 2.19 too, on kernel 3.17 (might be important too).

> The worst overhead I could trigger is with 12 threads:
>
> * 12 threads 35 runs median tps (average is consistent)
> - no logging: 155917
> - separate logging: 124695 (-20.0%)
> - merged logging: 119349 (-23.5%)
>
> My conclusion from these figures is that although the direct merged
> logging approach adds some overhead, this overhead is small wrt
> detailed logging (it adds 3.5% to a 20% logging overhead) with 12
> threads. Other tests, even with more threads, did not yield larger
> absolute or relative overheads. Although the direct merge approach is
> shown to add overheads, this is a small additional overhead on a
> quite bad situation already, which suggest that using detailed log on
> a cpu-bound pgbench run is a bad idea to begin with.

Yeah, seems like that.

> For a more realistic test, I ran "simple updates" which involve actual
> disk writes. It ran at around 840 tps with 24 threads. The logging
> overhead seems to be under 1%, and there is no significant difference
> between separate and merged on the 20 runs.
>
> So my overall conclusion is:
>
> (1) The simple thread-shared file approach would save pgbench from
> post-processing merge-sort heavy code, for a reasonable cost.

No it wouldn't - you're missing the fact that the proposed approach
(shared file + fprintf) only works with raw transaction log.

It does not work with aggregated log - the threads would have to somehow
track the progress of the other threads somehow, in a very non-trivial
way (e.g. what if one of the threads executes a long query, and thus
does not send the results for a long time?). Another option would be to
update shared aggregated results, but that requires locking.

> (2) The feature would not be available for the thread-emulation with
> this approach, but I do not see this as a particular issue as I
> think that it is pretty much only dead code and a maintenance burden.

I'm not willing to investigate that, nor am I willing to implement
another feature that works only sometimes (I've done that in the past,
and I find it a bad practice).

If someone else is willing to try to eliminate the thread emulation, I
won't object to that. But as I pointed out above, simple fprintf is not
going to work for the aggregated log - solving that will need more code
(e.g. maintaining aggregated results for all threads, requiring
additional locking etc).

>
> (3) Optimizing doLog from its current fprintf-based implementation
> may be a good thing.

That's probably true. The simplest thing we can do right now is
buffering the data into larger chunks and writing those chunks. That
amortizes the costs of locking. Using O_APPEND, as suggested by Andres,
seems like a promising idea.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-03-28 22:53:50 Re: proposal: row_to_array function
Previous Message Petr Jelinek 2015-03-28 22:50:20 Re: Replication identifiers, take 4