Re: Document efficient self-joins / UPDATE LIMIT techniques.

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Document efficient self-joins / UPDATE LIMIT techniques.
Date: 2024-02-08 01:46:50
Message-ID: 35b96eb9034f9ab336b273b54e8818ed334e47b0.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2024-02-03 at 15:27 -0500, Corey Huinker wrote:
>
> Here's another attempt, applying Laurenz's feedback:

I like this patch much better.

Some comments:

> --- a/doc/src/sgml/ref/delete.sgml
> +++ b/doc/src/sgml/ref/delete.sgml
> @@ -234,6 +234,24 @@ DELETE FROM films
> In some cases the join style is easier to write or faster to
> execute than the sub-select style.
> </para>
> + <para id="delete-limit">
> + While there is no <literal>LIMIT</literal> clause for
> + <command>DELETE</command>, it is possible to get a similar effect
> + using the method for <command>UPDATE</command> operations described
> + <link linkend="update-limit">in greater detail here</link>.
> +<programlisting>
> +WITH delete_batch AS (
> + SELECT l.ctid
> + FROM user_logs AS l
> + WHERE l.status = 'archived'
> + ORDER BY l.creation_date
> + LIMIT 10000
> + FOR UPDATE
> +)
> +DELETE FROM user_logs AS ul
> +USING delete_branch AS del
> +WHERE ul.ctid = del.ctid;
> +</programlisting></para>
> </refsect1>
>
> <refsect1>

- About the style: there is usually an empty line between an ending </para>
and the next starting <para>. It does not matter for correctness, but I
think it makes the source easier to read.

- I would rather have only "here" as link text rather than "in greater details
here". Even better would be something that gives the reader a clue where
the link will take her, like
<link linkend="update-limit">the documentation of <command>UPDATE</command></link>.

- I am not sure if it is necessary to have the <programlisting> at all.
I'd say that it is just a trivial variation of the UPDATE example.
On the other hand, a beginner might find the example useful.
Not sure.

If I had my way, I'd just keep the first paragraph, something like

<para id="delete-limit">
While there is no <literal>LIMIT</literal> clause for
<command>DELETE</command>, it is possible to get a similar effect
using a self-join with a common table expression as described in the
<link linkend="update-limit"><command>UPDATE</command> examples</link>.
</para>

> diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
> index 2ab24b0523..49e0dc29de 100644
> --- a/doc/src/sgml/ref/update.sgml
> +++ b/doc/src/sgml/ref/update.sgml
> @@ -434,7 +434,6 @@ UPDATE wines SET stock = stock + 24 WHERE winename = 'Chateau Lafite 2003';
> COMMIT;
> </programlisting>
> </para>
> -
> <para>
> Change the <structfield>kind</structfield> column of the table
> <structname>films</structname> in the row on which the cursor

Please don't.

I'm mostly fine with the UPDATE example.

> + it can make sense to perform the operation in smaller batches. Performing a
> + <command>VACUUM</command> operation on the table in between batches can help
> + reduce table bloat. The

I think the "in" before between is unnecessary and had better be removed, but
I'll defer to the native speaker.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-02-08 01:52:11 Re: glibc qsort() vulnerability
Previous Message Thomas Munro 2024-02-08 01:27:59 Re: cfbot is failing all tests on FreeBSD/Meson builds