Re: pgbench - implement strict TPC-B benchmark

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - implement strict TPC-B benchmark
Date: 2019-08-02 18:31:06
Message-ID: 20190802183106.aj4rmyvdlv6y5py4@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-08-02 10:34:24 +0200, Fabien COELHO wrote:
>
> Hello Andres,
>
> Thanks a lot for these feedbacks and comments.
>
> > Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1
> > e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's
> > a pretty significant share.
>
> Fine, but what is the corresponding server load? 211%? 611%? And what actual
> time is spent in pgbench itself, vs libpq and syscalls?

System wide pgbench, including libpq, is about 22% of the whole system.

As far as I can tell there's a number of things that are wrong:
- prepared statement names are recomputed for every query execution
- variable name lookup is done for every command, rather than once, when
parsing commands
- a lot of string->int->string type back and forths

> Conclusion: pgbench-specific overheads are typically (much) below 10% of the
> total client-side cpu cost of a transaction, while over 90% of the cpu cost
> is spent in libpq and system, for the worst case do-nothing query.

I don't buy that that's the actual worst case, or even remotely close to
it. I e.g. see higher pgbench overhead for the *modify* case than for
the pgbench's readonly case. And that's because some of the meta
commands are slow, in particular everything related to variables. And
the modify case just has more variables.

>
> > + 12.35% pgbench pgbench [.] threadRun
> > + 3.54% pgbench pgbench [.] dopr.constprop.0
> > + 3.30% pgbench pgbench [.] fmtint
> > + 1.93% pgbench pgbench [.] getVariable
>
> ~ 21%, probably some inlining has been performed, because I would have
> expected to see significant time in "advanceConnectionState".

Yea, there's plenty inlining. Note dopr() is string processing.

> > + 2.95% pgbench libpq.so.5.13 [.] PQsendQueryPrepared
> > + 2.15% pgbench libpq.so.5.13 [.] pqPutInt
> > + 4.47% pgbench libpq.so.5.13 [.] pqParseInput3
> > + 1.66% pgbench libpq.so.5.13 [.] pqPutMsgStart
> > + 1.63% pgbench libpq.so.5.13 [.] pqGetInt
>
> ~ 13%

A lot of that is really stupid. We need to improve
libpq. PqsendQueryGuts (attributed to PQsendQueryPrepared here), builds
the command in many separate pqPut* commands, which reside in another
translation unit, is pretty sad.

> > + 3.16% pgbench libc-2.28.so [.] __strcmp_avx2
> > + 2.95% pgbench libc-2.28.so [.] malloc
> > + 1.85% pgbench libc-2.28.so [.] ppoll
> > + 1.85% pgbench libc-2.28.so [.] __strlen_avx2
> > + 1.85% pgbench libpthread-2.28.so [.] __libc_recv
>
> ~ 11%, str is a pain… Not sure who is calling though, pgbench or
> libpq.

Both. Most of the strcmp is from getQueryParams()/getVariable(). The
dopr() is from pg_*printf, which is mostly attributable to
preparedStatementName() and getVariable().

> This is basically 47% pgbench, 53% lib*, on the sample provided. I'm unclear
> about where system time is measured.

It was excluded in this profile, both to reduce profiling costs, and to
focus on pgbench.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-08-02 19:48:59 Re: pglz performance
Previous Message Andres Freund 2019-08-02 18:20:03 Re: pglz performance