Quick Links

Re: enhance the efficiency of migrating particularly large tables

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	David Zhang <david(dot)zhang(at)highgo(dot)ca>
Cc:	Pgsql Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: enhance the efficiency of migrating particularly large tables
Date:	2024-04-08 22:23:28
Message-ID:	CAApHDvr=FLi63sPDZUokKtC094EcOz_suGsLgmNJs2U+WkysRA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 9 Apr 2024 at 09:52, David Zhang <david(dot)zhang(at)highgo(dot)ca> wrote:
> However, when executing SELECT min(ctid) and max(ctid), it performs a
> Seq Scan, which can be slow for a large table. Is there a way to
> retrieve the minimum and maximum ctid other than using the system
> functions min() and max()?

Finding the exact ctid seems overkill for what you need. Why you
could just find the maximum block with:

N = pg_relation_size('name_of_your_table'::regclass) /
current_Setting('block_size')::int;

and do WHERE ctid < '(N,1)';

If we wanted to optimise this in PostgreSQL, the way to do it would
be, around set_plain_rel_pathlist(), check if the relation's ctid is a
required PathKey by the same means as create_index_paths() does, then
if found, create another seqscan path without synchronize_seqscans *
and tag that with the ctid PathKey sending the scan direction
according to the PathKey direction. nulls_first does not matter since
ctid cannot be NULL.

Min(ctid) query should be able to make use of this as the planner
should rewrite those to subqueries with a ORDER BY ctid LIMIT 1.

* We'd need to invent an actual Path type for SeqScanPath as I see
create_seqscan_path() just uses the base struct Path.
synchronize_seqscans would have to become a property of that new Path
type and it would need to be carried forward into the plan and looked
at in the executor so that we always start a scan at the first or last
block.

Unsure if such a feature is worthwhile. I think maybe not for just
min(ctid)/max(ctid). However, there could be other reasons, such as
the transform OR to UNION stuff that Tom worked on a few years ago.
That needed to eliminate duplicate rows that matched both OR branches
and that was done using ctid.

David

In response to

enhance the efficiency of migrating particularly large tables at 2024-04-08 21:52:13 from David Zhang

Responses

Re: enhance the efficiency of migrating particularly large tables at 2024-04-08 23:02:58 from Tom Lane
Re: enhance the efficiency of migrating particularly large tables at 2024-05-02 21:33:41 from David Zhang

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2024-04-08 22:34:10	Re: PostgreSQL 17 Release Management Team & Feature Freeze
Previous Message	Andrew Dunstan	2024-04-08 22:20:13	Re: WIP Incremental JSON Parser