Re: PATCH: pgbench - merging transaction logs

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: pgbench - merging transaction logs
Date: 2015-03-31 01:20:31
Message-ID: 5519F65F.7040005@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.3.2015 10:58, Fabien COELHO wrote:

>>> So my overall conclusion is:
>>>
>>> (1) The simple thread-shared file approach would save pgbench from
>>> post-processing merge-sort heavy code, for a reasonable cost.
>>
>> No it wouldn't - you're missing the fact that the proposed approach
>> (shared file + fprintf) only works with raw transaction log.
>>
>> It does not work with aggregated log - the threads would have to somehow
>> track the progress of the other threads somehow, in a very non-trivial
>> way (e.g. what if one of the threads executes a long query, and thus
>> does not send the results for a long time?).
>
> The counters are updated when the transaction is finished anyway?

Yes, but the thread does not know it's time to write the results until
it completes the first transaction after the interval ends ...

Let's say the very first query in thread #1 takes a minute for some
reason, while the other threads process 100 transactions per second. So
before the thread #1 can report 0 transactions for the first second, the
other threads already have results for the 60 intervals.

I think there's no way to make this work except for somehow tracking
timestamp of the last submitted results for each thread, and only
flushing results older than the minimum of the timestamps. But that's
not trivial - it certainly is more complicated than just writing into a
shared file descriptor.

>> Another option would be to update shared aggregated results, but
>> that requires locking.
>
> That is what I had in mind. ISTM that the locking impact would be
> much lower than for logging, the data is just locked for a counter
> update, and if counters are per-thread, a conflict may only occur
> when the data are gathered for actual logging, which would be rather
> infrequent. Even if the counters are shared, the locking would be for
> small time that would imply a low conflict probability. So I do not
> see this one as a significant performance issue.

No, for the reasons I explained above (thread not sending results for a
long time, etc.).

Merging results for each transaction would not have this issue, but it
would also use the lock much more frequently, and that seems like a
pretty bad idea (especially for the workloads with short transactions
that you suggested are bad match for detailed log - this would make the
aggregated log bad too).

Also notice that with all the threads will try to merge the data (and
thus acquire the lock) at almost the same time - this is especially true
for very short transactions. I would be surprised if this did not cause
issues on read-only workloads with large numbers of threads.

>>> (2) The feature would not be available for the thread-emulation with
>>> this approach, but I do not see this as a particular issue as I
>>> think that it is pretty much only dead code and a maintenance burden.
>>
>> I'm not willing to investigate that, nor am I willing to implement
>> another feature that works only sometimes (I've done that in the past,
>> and I find it a bad practice).
>
> Hmmm. Keeping an obsolete feature with significant impact on how
> other features can be implemented, so basically a maintenance burden,
> does not look like best practice *either*.

That is not what I said, though.

If thread emulation is considered obsolete - fine, let's remove it. But
that's not really certain at this point, and until it's actually removed
I'm not particularly thrilled by adding a feature that does not work
with --disable-thread-safety.

>> If someone else is willing to try to eliminate the thread
>> emulation, I won't object to that.
>
> Hmmm. I'll try to trigger a discussion in another thread to test the
> idea.

OK, good.

>
>> But as I pointed out above, simple fprintf is not going to work
>> for the aggregated log - solving that will need more code (e.g.
>> maintaining aggregated results for all threads, requiring
>> additional locking etc).
>
> The code for that is probably simple and short, and my wish is to
> try to avoid an external merge sort post processing, if possible,
> which is not especially cheap anyway, neither in code nor in time.

First, I don't think it's as simple as you think.

Second, which the merge sort is not free, it happens *after* the pgbench
run completed and does not interfere with it at all.

>>> (3) Optimizing doLog from its current fprintf-based implementation
>>> may be a good thing.
>>
>> That's probably true. The simplest thing we can do right now is
>> buffering the data into larger chunks and writing those chunks.
>> That amortizes the costs of locking.
>
> If it is buffered in the process, that would mean more locking. If
> it is buffered per threads that would result in out of order logs.
> Would that be an issue? It would be fine with me.

You're mixing two things - optimizing doLog and writing into a shared
file. I was talking about optimizing doLog as it stands now, i.e. per
thread, and in that case it does not cause any out-of-order logs.

Also, if the lock for the shared buffer is cheaper than the lock
required for fprintf, it may still be an improvement.

>> Using O_APPEND, as suggested by Andres, seems like a promising idea.
>
> I tried that with a shared file handle, but the impact seemed
> negligeable. The figures I reported used it, btw.
>
> I also tried to open the same file in append mode from all threads,
> with positive performance effects, but then the flush did not occur
> at line boundaries to there was some mangling in the result.

OK.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2015-03-31 01:48:38 Re: Bug #10432 failed to re-find parent key in index
Previous Message Peter Eisentraut 2015-03-31 00:59:21 pg_rewind tests