Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2023-03-06 22:00:27
Message-ID: 20230306220027.GA3122138@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

+void
+XLogReadFromBuffers(XLogRecPtr startptr,
+ TimeLineID tli,
+ Size count,
+ char *buf,
+ Size *read_bytes)

Since this function presently doesn't return anything, can we have it
return the number of bytes read instead of storing it in a pointer
variable?

+ ptr = startptr;
+ nbytes = count;
+ dst = buf;

These variables seem superfluous.

+ /*
+ * Requested WAL isn't available in WAL buffers, so return with
+ * what we have read so far.
+ */
+ break;

nitpick: I'd move this to the top so that you can save a level of
indentation.

if (expectedEndPtr != endptr)
break;

... logic for when the data is found in the WAL buffers ...

+ /*
+ * All the bytes are not in one page. Read available bytes on
+ * the current page, copy them over to output buffer and
+ * continue to read remaining bytes.
+ */

Is it possible to memcpy more than a page at a time?

+ /*
+ * The fact that we acquire WALBufMappingLock while reading the WAL
+ * buffer page itself guarantees that no one else initializes it or
+ * makes it ready for next use in AdvanceXLInsertBuffer().
+ *
+ * However, we perform basic page header checks for ensuring that
+ * we are not reading a page that just got initialized. Callers
+ * will anyway perform extensive page-level and record-level
+ * checks.
+ */

Hm. I wonder if we should make these assertions instead.

+ elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for given LSN %X/%X, Timeline ID %u",
+ *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli);

I definitely don't think we should put an elog() in this code path.
Perhaps this should be guarded behind WAL_DEBUG.

+ /*
+ * Check if we have read fully (hit), partially (partial hit) or
+ * nothing (miss) from WAL buffers. If we have read either partially or
+ * nothing, then continue to read the remaining bytes the usual way,
+ * that is, read from WAL file.
+ */
+ if (count == read_bytes)
+ {
+ /* Buffer hit, so return. */
+ return true;
+ }
+ else if (read_bytes > 0 && count > read_bytes)
+ {
+ /*
+ * Buffer partial hit, so reset the state to count the read bytes
+ * and continue.
+ */
+ buf += read_bytes;
+ startptr += read_bytes;
+ count -= read_bytes;
+ }
+
+ /* Buffer miss i.e., read_bytes = 0, so continue */

I think we can simplify this. We effectively take the same action any time
"count" doesn't equal "read_bytes", so there's no need for the "else if".

if (count == read_bytes)
return true;

buf += read_bytes;
startptr += read_bytes;
count -= read_bytes;

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2023-03-06 22:09:58 Re: add PROCESS_MAIN to VACUUM
Previous Message David Rowley 2023-03-06 21:52:08 Re: using memoize in in paralel query decreases performance