From: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "bharath(dot)rupireddyforpostgres(at)gmail(dot)com" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: Show WAL write and fsync stats in pg_stat_io |
Date: | 2024-01-10 12:59:24 |
Message-ID: | CAN55FZ1c=p0Gwqa=hUqoKHStEtkMqMecdhC5YQf-RbStZVhPhw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, 10 Jan 2024 at 08:25, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Jan 03, 2024 at 04:10:58PM +0300, Nazir Bilal Yavuz wrote:
> >
> > I thought removing op_bytes completely ( as you said "This patch
> > extends it with two more operation sizes, and there are even cases
> > where it may be a variable" ) from pg_stat_io view then adding
> > something like {read | write | extend}_bytes and {read | write |
> > extend}_calls could be better, so that we don't lose any information.
>
> But then you'd lose the possibility to analyze correlations between
> the size and the number of the operations, which is something that
> matters for more complex I/O scenarios. This does not need to be
> tackled in this patch, which is useful on its own, though I am really
> wondering if this is required for the recent work done by Thomas.
> Perhaps Andres, Thomas or Melanie could comment on that?
Yes, you are right.
> >> Yeah, a limitation like that may be acceptable for now. Tracking the
> >> WAL writer and WAL sender activities can be relevant in a lot of cases
> >> even if we don't have the full picture for the WAL receiver yet.
> >
> > I added that and disabled B_WAL_RECEIVER backend with comments
> > explaining why. v8 is attached.
>
> I can see that's what you have been adding here, which should be OK:
>
> > - if (track_io_timing)
> > + /*
> > + * B_WAL_RECEIVER backend does IOOBJECT_WAL IOObject & IOOP_READ IOOp IOs
> > + * but these IOs are not countable for now because IOOP_READ IOs' op_bytes
> > + * (number of bytes per unit of I/O) might not be the same all the time.
> > + * The current implementation requires that the op_bytes must be the same
> > + * for the same IOObject, IOContext and IOOp. To avoid confusion, the
> > + * B_WAL_RECEIVER backend & IOOBJECT_WAL IOObject IOs are disabled for
> > + * now.
> > + */
> > + if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL)
> > + return;
>
> This could be worded better, but that's one of these nits from me I
> usually tweak when committing stuff.
Thanks for doing that! Do you have any specific comments that can help
improve it?
> > +/*
> > + * Decide if IO timings need to be tracked. Timings associated to
> > + * IOOBJECT_WAL objects are tracked if track_wal_io_timing is enabled,
> > + * else rely on track_io_timing.
> > + */
> > +static bool
> > +pgstat_should_track_io_time(IOObject io_object)
> > +{
> > + if (io_object == IOOBJECT_WAL)
> > + return track_wal_io_timing;
> > +
> > + return track_io_timing;
> > +}
>
> One thing I was also considering is if eliminating this routine would
> make pgstat_count_io_op_time() more readable the result, but I cannot
> get to that.
I could not think of a way to eliminate pgstat_should_track_io_time()
route without causing performance regressions. What do you think about
moving inside of 'pgstat_should_track_io_time(io_object) if check' to
another function and call this function from
pgstat_count_io_op_time()? This does not change anything but IMO it
increases the readability.
> > if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND)
> > {
> > - pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
> > + if (io_object != IOOBJECT_WAL)
> > + pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +
> > if (io_object == IOOBJECT_RELATION)
> > INSTR_TIME_ADD(pgBufferUsage.shared_blk_write_time, io_time);
> > else if (io_object == IOOBJECT_TEMP_RELATION)
> > @@ -139,7 +177,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
> > }
> > else if (io_op == IOOP_READ)
> > {
> > - pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
> > + if (io_object != IOOBJECT_WAL)
> > + pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +
> > if (io_object == IOOBJECT_RELATION)
> > INSTR_TIME_ADD(pgBufferUsage.shared_blk_read_time, io_time);
> > else if (io_object == IOOBJECT_TEMP_RELATION)
>
> A second thing is if this would be better with more switch/cases, say:
> switch (io_op):
> {
> case IOOP_EXTEND:
> case IOOP_WRITE:
> switch (io_object):
> {
> case WAL:
> /* do nothing */
> break;
> case RELATION:
> case TEMP:
> .. blah ..
> }
> break;
> case IOOP_READ:
> switch (io_object):
> {
> .. blah ..
> }
> break;
> }
>
> Or just this one to make it clear that nothing happens for WAL
> objects:
> switch (io_object):
> {
> case WAL:
> /* do nothing */
> break;
> case RELATION:
> switch (io_op):
> {
> case IOOP_EXTEND:
> case IOOP_WRITE:
> .. blah ..
> case IOOP_READ:
> .. blah ..
> }
> break;
> case TEMP:
> /* same switch as RELATION */
> break;
> }
>
> This duplicates a bit things, but at least in the second case it's
> clear which counters are updated when I/O timings are tracked. It's
> OK by me if people don't like this suggestion, but that would avoid
> bugs like the one I found upthread.
I am more inclined towards the second one because it is more likely
that a new io_object will be introduced rather than a new io_op. So, I
think the second one is a bit more future proof.
--
Regards,
Nazir Bilal Yavuz
Microsoft
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias van de Meent | 2024-01-10 13:03:28 | Re: the s_lock_stuck on perform_spin_delay |
Previous Message | Aleksander Alekseev | 2024-01-10 12:44:25 | Re: System username in pg_stat_activity |