From: | Adam Hendel <adam(at)tembo(dot)io> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [PATCH] pgbench log file headers |
Date: | 2023-11-21 04:22:21 |
Message-ID: | CABfuTggAAJ8qgY4jNt9msUTk-VOBMRHGAZqOFpd3T4QiOJz+_Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
On Mon, Nov 13, 2023 at 6:01 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2023-11-13 11:55:07 -0600, Adam Hendel wrote:
> > Currently, pgbench will log individual transactions to a logfile when the
> > `--log` parameter flag is provided. The logfile, however, does not
> include
> > column header. It has become a fairly standard expectation of users to
> have
> > column headers present in flat files. Without the header in the pgbench
> log
> > files, new users must navigate to the docs and piece together the column
> > headers themselves. Most industry leading frameworks have tooling built
> in
> > to read column headers though, for example python/pandas read_csv().
>
> The disadvantage of doing that is that a single pgbench run with --log will
> generate many files when using -j, to avoid contention. The easiest way to
> deal with that after the run is to cat all the log files to a larger one.
> If
> you do that with headers present, you obviously have a few bogus rows (the
> heades from the various files).
>
> I guess you could avoid the "worst" of that by documenting that the
> combined
> log file should be built by
> cat {$log_prefix}.${pid} {$log_prefix}.${pid}.*
> and only outputting the header in the file generated by the main thread.
>
>
> > We can improve the experience for users by adding column headers to
> pgbench
> > logfiles with an optional command line flag, `--log-header`. This will
> keep
> > the API backwards compatible by making users opt-in to the column
> headers.
> > It follows the existing pattern of having conditional flags in pgbench’s
> > API; the `--log` option would have both –log-prefix and –log-header if
> this
> > work is accepted.
>
> > The implementation considers the column headers only when the
> > `--log-header` flag is present. The values for the columns are taken
> > directly from the “Per-Transaction Logging” section in
> > https://www.postgresql.org/docs/current/pgbench.html and takes into
> account
> > the conditional columns `schedule_lag` and `retries`.
> >
> >
> > Below is an example of what that logfile will look like:
> >
> >
> > pgbench postgres://postgres:postgres(at)localhost:5432/postgres --log
> > --log-header
> >
> > client_id transaction_no time script_no time_epoch time_us
>
> Independent of your patch, but we imo ought to combine time_epoch time_us
> in
> the log output. There's no point in forcing consumers to combine those
> fields,
> and it makes logging more expensive... And if we touch this, we should
> just
> switch to outputting nanoseconds instead of microseconds.
>
> It also is quite odd that we have "time" and "time_epoch", "time_us", where
> time is the elapsed time of an individual "transaction" and time_epoch +
> time_us together are a wall-clock timestamp. Without differentiating
> between
> those, the column headers seem not very useful, because one needs to look
> in
> the documentation to understand the fields.
>
>
> I don't think there's all that strong a need for backward compatibility in
> pgbench. We could just change the columns as I suggest above and then
> always
> emit the header in the "main" log file.
>
I updated the patch to always log the header and only off the main thread.
As for the time headers, I will work on renaming/combining those in a
separate commit as:
time -> time_elapsed
time_epoch + time_us -> time_completion_us
> Greetings,
>
> Andres Freund
>
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Adds-colum-headers-to-the-log-file-produced-by-pg.patch | application/octet-stream | 2.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-11-21 04:28:54 | Re: Synchronizing slots from primary to standby |
Previous Message | Michael Paquier | 2023-11-21 03:54:36 | Re: Use of backup_label not noted in log |