Re: pgbench: INSERT workload, FK indexes, filler fix

From: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench: INSERT workload, FK indexes, filler fix
Date: 2021-07-01 15:54:02
Message-ID: lzsg0yuj1h.fsf@veeddrois.attlocal.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Fabien COELHO writes:

> Hello Greg,
>
> Some quick feedback about the patch and the arguments.
>
> Filling: having an empty string/NULL has been bothering me for some time. However there is a
> significant impact on the client/server network stream while initializing or running queries, which
> means that pgbench older performance report would be comparable to newer ones, which is a pain even
> if the new results do make sense, as you noted in a comment. I'm okay with breaking that, but it
> would require a consensus: People would run pgbench on a previous install, upgrade, run pgbench
> again, and report a massive performance regression. Who will have to deal with that noise?

I agree that it is a behavior change, but "filler" that literally includes nothing but a NULL bitmap
or minimal-length column isn't really measuring what it sets out to measure, so to me it seems like
we need to bite the bullet and just start doing what we claim to already be doing; this is something
that has been inaccurate for a long time, and continuing to keep it inaccurate in the name of
consistency seems to be the wrong tack to take here. (My argument to the group at large, not you
specifically.)

I assume that we will need to include a big note in the documentation about the behavior change,
perhaps even a note in the output of pgbench itself; the "right" answer can be bikeshedded about.

> A work around could be to add new workloads with different names, and let the previous workloads
> more or less as is.

You're basically suggesting "tpcb-like-traditional" and "tcpb-like-actual"? :-) I guess that would
be an approach of sorts, though more than one of the built-ins needed to change in this, and I
question how useful expanding these workloads will be.

> "--insert-only" as a short hand for "-b insert-only": I do not think this is really needed to save 1
> char. Also note that "-b i" would probably work.

Fair; I was just mirroring the existing structure.

> extra indexes: I'm ok on principle. Do we want an option for that though? Isn't adding "i" to -I
> enough? Also I do not like much the code which modifies the -I provided string to add a "i".

To me it seems disingenuous to setup a situation where you'd have FKs with no indexes, which is why
I'd added that modification; unless you're talking anout something different?

>> After bouncing the possibilities around a little, David and I thought this
>> specific set of changes might be the right amount of change for one PG
>> version.
>
> Hmmm. I was hoping for more changes:-) Eg the current error handling patch would be great.

I'm happy to continue working on improving this part of the program.

>> benchmark noise from where I started at with PG. The $750 USD AMD retail
>> chip in my basement lab pushes 1M TPS of prepared SELECT statements over
>> sockets. Plus or minus 84 bytes per row in a benchmark database doesn't
>> worry me so much anymore.
>
> AFAICR the space is actually allocated by pg and filled with blanks, just not transfered by the
> protocol? For an actual network connection I guess the effect should be quite noticeable.

This patchset included filling with actual bytes, not just padding (or implied padding via
char(n)). Depending on how this is invoked, it could definitely add some network overhead (though
I'd be surprised if it pushed it over a single packet relative to the original size of the query).

>> [...]
>> I personally would prefer to see pgbench lead by example here, that tables
>> related this way should be indexed with FKs by default, as the Right Way to
>> do such things.
>
> I do agree that the default should be the good choices, and that some manual effort should be done
> to get the bad ones. The only issue is that people do not like change.

Heh, you are not wrong here. Hopefully we can get some consensus about this being the right way
forward.

Best,

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-07-01 15:55:41 Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
Previous Message Tom Lane 2021-07-01 15:46:22 Re: make world and install-world without docs