Re: pgbench rate limiting changes transaction latency computation

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench rate limiting changes transaction latency computation
Date: 2019-06-12 07:09:02
Message-ID: a411e48c-f5af-45a4-873a-770c5a2da863@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/06/2019 09:23, Fabien COELHO wrote:
>> But that was just because I was showing a simplistic example. E.g.
>> here's a log of a vacuum finishing, and then another starting a few
>> seconds later (both vacuums lasting a fair while):
>>
>> progress: 139.0 s, 2438.4 tps, txtime 13.033 ms stddev 3.830, lat 17530.784 ms stddev 590.153, lag 17517.751 ms
>> progress: 140.0 s, 2489.0 tps, txtime 12.911 ms stddev 3.642, lat 17752.862 ms stddev 600.661, lag 17739.952 ms
>> progress: 141.0 s, 2270.0 tps, txtime 14.021 ms stddev 4.965, lat 17973.805 ms stddev 594.784, lag 17959.784 ms
>> progress: 142.0 s, 1408.0 tps, txtime 22.848 ms stddev 5.365, lat 18417.808 ms stddev 632.729, lag 18394.960 ms
>> progress: 143.0 s, 3001.0 tps, txtime 10.724 ms stddev 4.318, lat 18796.971 ms stddev 617.462, lag 18786.247 ms
>> progress: 144.0 s, 4678.0 tps, txtime 6.823 ms stddev 2.136, lat 18503.253 ms stddev 669.072, lag 18496.431 ms
>> progress: 145.0 s, 4577.0 tps, txtime 7.001 ms stddev 1.526, lat 18108.596 ms stddev 689.843, lag 18101.596 ms
>> progress: 146.0 s, 2596.0 tps, txtime 12.261 ms stddev 3.060, lat 17961.623 ms stddev 683.498, lag 17949.363 ms
>> progress: 147.0 s, 2654.0 tps, txtime 12.072 ms stddev 3.282, lat 18120.009 ms stddev 685.074, lag 18107.938 ms
>> progress: 148.0 s, 3471.0 tps, txtime 9.240 ms stddev 3.702, lat 18251.712 ms stddev 676.572, lag 18242.472 ms
>> progress: 149.0 s, 3056.0 tps, txtime 10.468 ms stddev 5.131, lat 18058.950 ms stddev 675.334, lag 18048.482 ms
>> progress: 150.0 s, 2319.0 tps, txtime 13.778 ms stddev 3.762, lat 18305.101 ms stddev 688.186, lag 18291.323 ms
>> progress: 151.0 s, 2355.0 tps, txtime 13.567 ms stddev 3.891, lat 18586.073 ms stddev 691.656, lag 18572.506 ms
>> progress: 152.0 s, 2321.0 tps, txtime 13.742 ms stddev 3.708, lat 18835.985 ms stddev 709.580, lag 18822.244 ms
>> progress: 153.0 s, 2360.0 tps, txtime 13.604 ms stddev 3.533, lat 19121.166 ms stddev 709.682, lag 19107.562 ms
>>
>> The period inbetween where no vacuum was running is imo considerably
>> harder to spot looking at 'lat'.
>
> ISTM that the signal is pretty clear in whether the lag increases or
> decreases. Basically the database is 18 seconds behind its load, which is
> very bad if a user is waiting.

That was my thought too, when looking at this example. When there is
already a long queue of transactions, the txtime of individual
transactions doesn't matter much. The most important thing under that
condition is how fast the system can dissolve the queue (or how fast it
builds up even more). So the derivative of the lag or lat seems like the
most important figure. We don't print exactly that, but it's roughly the
same as the TPS. Jitter experienced by the user matters too, i.e. stddev
of 'lat'.

To illustrate this, imagine that the server magically detected that
there's a long queue of transactions. It would be beneficial to go into
"batch mode", where it collects incoming transactions into larger
batches. The effect of this imaginary batch mode is that the TPS rate
increases by 50%, but the txtime also increases by 1000%, and becomes
highly variable. Would that be a good tradeoff? I would say yes. The
user is experiencing an 18 s delay anyway, and the increase in txtime
would be insignificant compared to that, but the queue would be busted
more quickly.

Of course, there is no such batch mode in PostgreSQL, and I wouldn't
suggest trying to implement anything like that. In a different kind of
application, you would rather maintain a steady txtime when the server
is at full load, even if it means a lower overall TPS rate. And that
feels like a more important goal than just TPS. I think we all agree on
that. To simulate that kind of an application, though, you probably
don't want to use -R, or you would use it with --latency-limit. Except
clearly Andres is trying to do just that, which is why I'm still a bit
confused :-).

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-06-12 07:28:35 Re: [PATCH] Speedup truncates of relation forks
Previous Message legrand legrand 2019-06-12 07:08:38 Re: Adaptive query optimization