Re: New docs chapter on Transaction Management and related changes

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Treat <rob(at)xzilla(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New docs chapter on Transaction Management and related changes
Date: 2022-11-07 22:04:46
Message-ID: 3603e6e85544daa5300c7106c31bc52673711cd0.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2022-11-05 at 10:08 +0000, Simon Riggs wrote:
> Agreed; new compilation patch attached, including mine and then
> Robert's suggested rewordings.

Thanks. There is clearly a lot of usefule information in this.

Some comments:

> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -24673,7 +24673,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
> <para>
> Returns the current transaction's ID. It will assign a new one if the
> current transaction does not have one already (because it has not
> - performed any database updates).
> + performed any database updates); see <xref
> + linkend="transaction-id"/> for details. If executed in a
> + subtransaction this will return the top-level xid; see <xref
> + linkend="subxacts"/> for details.
> </para></entry>
> </row>

I would use a comma after "subtransaction", and I think it would be better to write
"transaction ID" instead of "xid".

> @@ -24690,6 +24693,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
> ID is assigned yet. (It's best to use this variant if the transaction
> might otherwise be read-only, to avoid unnecessary consumption of an
> XID.)
> + If executed in a subtransaction this will return the top-level xid.
> </para></entry>
> </row>

Same as above.

> @@ -24733,6 +24737,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
> <para>
> Returns a current <firstterm>snapshot</firstterm>, a data structure
> showing which transaction IDs are now in-progress.
> + Only top-level xids are included in the snapshot; subxids are not
> + shown; see <xref linkend="subxacts"/> for details.
> </para></entry>
> </row>

Again, I would avoid "xid" and "subxid", or at least use "transaction ID (xid)"
and similar.

> --- a/doc/src/sgml/ref/release_savepoint.sgml
> +++ b/doc/src/sgml/ref/release_savepoint.sgml
> @@ -34,23 +34,16 @@ RELEASE [ SAVEPOINT ] <replaceable>savepoint_name</replaceable>
> <title>Description</title>
>
> <para>
> - <command>RELEASE SAVEPOINT</command> destroys a savepoint previously defined
> - in the current transaction.
> + <command>RELEASE SAVEPOINT</command> will subcommit the subtransaction
> + established by the named savepoint, if one exists. This will release
> + any resources held by the subtransaction. If there were any
> + subtransactions of the named savepoint, these will also be subcommitted.
> </para>
>
> <para>

"Subtransactions of the named savepoint" is somewhat confusing; how about
"subtransactions of the subtransaction established by the named savepoint"?

If that is too long and explicit, perhaps "subtransactions of that subtransaction".

> @@ -78,7 +71,7 @@ RELEASE [ SAVEPOINT ] <replaceable>savepoint_name</replaceable>
>
> <para>
> It is not possible to release a savepoint when the transaction is in
> - an aborted state.
> + an aborted state, to do that use <xref linkend="sql-rollback-to"/>.
> </para>
>
> <para>

I think the following is more English:
"It is not possible ... state; to do that, use <xref .../>."

> --- a/doc/src/sgml/ref/rollback.sgml
> +++ b/doc/src/sgml/ref/rollback.sgml
> @@ -56,11 +56,14 @@ ROLLBACK [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> <term><literal>AND CHAIN</literal></term>
> <listitem>
> <para>
> - If <literal>AND CHAIN</literal> is specified, a new transaction is
> + If <literal>AND CHAIN</literal> is specified, a new unaborted transaction is
> immediately started with the same transaction characteristics (see <xref
> linkend="sql-set-transaction"/>) as the just finished one. Otherwise,
> no new transaction is started.

I don't think that is an improvement. "Unaborted" is an un-word. A new transaction
is always "unaborted", isn't it?

> --- a/doc/src/sgml/wal.sgml
> +++ b/doc/src/sgml/wal.sgml
> @@ -909,4 +910,36 @@
> seem to be a problem in practice.
> </para>
> </sect1>
> +
> + <sect1 id="two-phase">
> +
> + <title>Two-Phase Transactions</title>
> +
> + <para>
> + <productname>PostgreSQL</productname> supports a two-phase commit (2PC)
[...]
> + <filename>pg_twophase</filename> directory. Currently-prepared
> + transactions can be inspected using <link
> + linkend="view-pg-prepared-xacts"><structname>pg_prepared_xacts</structname></link>.
> + </para>
> + </sect1>
> +
> </chapter>

I don't like "currently-prepared". How about:
"Transaction that are currently prepared can be inspected..."

This is clearly interesting information, but I don't think the WAL chapter is the right
place for this. "pg_twophase" is already mentioned in "storage.sgml", and details about
when exactly a prepared transaction is persisted may exceed the details level needed by
the end user.

I'd look for that information in the reference page for PREPARE TRANSACTION; perhaps
that would be a better place. Or, even better, the new "xact.sgml" chapter.

> --- /dev/null
> +++ b/doc/src/sgml/xact.sgml

+ <title>Transaction Management</title>

+ The word transaction is often abbreviated as "xact".

Should use <quote> here.

> + <title>Transactions and Identifiers</title>

> + <para>
> + Once a transaction writes to the database, it is assigned a
> + non-virtual <literal>TransactionId</literal> (or <type>xid</type>),
> + e.g., <literal>278394</literal>. Xids are assigned sequentially
> + using a global counter used by all databases within the
> + <productname>PostgreSQL</productname> cluster. This property is used by
> + the transaction system to order transactions by their first database
> + write, i.e., lower-numbered xids started writing before higher-numbered
> + xids. Of course, transactions might start in a different order.
> + </para>

"This property"? How about:
"Because transaction IDs are assigned sequentially, the transaction system can
use them to order transactions by their first database write"

I would want some additional information here: why does the transaction system have
to order transactions by their first database write?

"Of course, transactions might start in a different order."

Now that confuses me. Are you saying that BEGIN could be in a different order
than the first database write? Perhaps like this:

"Note that the order in which transactions perform their first database write
might be different from the order in which the transactions started."

> + The internal transaction ID type <type>xid</type> is 32-bits wide

There should be no hyphen in "32 bits wide", just as in "3 years old".

> + A 32-bit epoch is incremented during each
> + wrap around.

We usually call this "wraparound" without a space.

> + There is also a 64-bit type <type>xid8</type> which
> + includes this epoch and therefore does not wrap around during the
> + life of an installation and can be converted to xid by casting.

Running "and"s. Better:

"There is also ... and does not wrap ... life of an installation.
<type>xid8</type> can be converted to <type>xid</type> by casting."

> + Xids are used as the
> + basis for <productname>PostgreSQL</productname>'s <link
> + linkend="mvcc">MVCC</link> concurrency mechanism, <link
> + linkend="hot-standby">Hot Standby</link>, and Read Replica servers.

What is the difference between a hot standby and a read replica? I think
one of these terms is sufficient.

> + In addition to <literal>vxid</literal> and <type>xid</type>,
> + when a transaction is prepared for two-phase commit it
> + is also identified by a Global Transaction Identifier
> + (<acronym>GID</acronym>).

Better:

"In addition to <literal>vxid</literal> and <type>xid</type>,
prepared transactions also have a Global Transaction Identifier
(<acronym>GID</acronym>) that is assigned when the transaction is
prepared for two-phase commit."

> + <sect1 id="xact-locking">
> +
> + <title>Transactions and Locking</title>
> +
> + <para>
> + Currently-executing transactions are shown in <link
> + linkend="view-pg-locks"><structname>pg_locks</structname></link>
> + in columns <structfield>virtualxid</structfield> and
> + <structfield>transactionid</structfield>.

Better:

"The transaction IDs of currently executing transactions are shown in <link
linkend="view-pg-locks"><structname>pg_locks</structname></link>
in the columns <structfield>virtualxid</structfield> and
<structfield>transactionid</structfield>."

> + Lock waits on table-level locks are shown waiting for
> + <structfield>virtualxid</structfield>, while lock waits on row-level
> + locks are shown waiting for <structfield>transactionid</structfield>.

That's not true. Transactions waiting for table-level locks are shown
waiting for a "relation" lock in both "pg_stat_activity" and "pg_locks".

> + Row-level read and write locks are recorded directly in locked
> + rows and can be inspected using the <xref linkend="pgrowlocks"/>
> + extension. Row-level read locks might also require the assignment
> + of multixact IDs (<literal>mxid</literal>). Mxids are recorded in
> + the <filename>pg_multixact</filename> directory.

"are recorded directly in *the* locked rows"

I think the mention of multixacts should link to
<xref linkend="vacuum-for-multixact-wraparound"/>. Again, I would not
specifically mention the directory, since it is already described in
"storage.sgml", but I have no strong optinion there.

> + <sect1 id="subxacts">
> +
> + <title>Subtransactions</title>

> + The word subtransaction is often abbreviated as
> + <literal>subxact</literal>.

I'd use <quote>, not <literal>.

> + If a subtransaction is assigned a non-virtual transaction ID,
> + its transaction ID is referred to as a <literal>subxid</literal>.

Again, I would use <quote>, since we don't <literal> "subxid"
elsewhere.

+ Up to
+ 64 open subxids are cached in shared memory for each backend; after
+ that point, the overhead increases significantly since we must look
+ up subxid entries in <filename>pg_subtrans</filename>.

Comma before "since". Perhaps you should mention that this means disk I/O.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-11-07 22:34:53 Re: Improve logging when using Huge Pages
Previous Message Jan Wieck 2022-11-07 21:54:29 Re: PL/pgSQL cursors should get generated portal names by default