Re: speedup COPY TO for partitioned table.

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speedup COPY TO for partitioned table.
Date: 2025-02-11 00:25:52
Message-ID: CAApHDvqPGoqvofphWmTp8EFMYa2KVg9ZGCsQAF6qp7ZHAsSd5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 11 Feb 2025 at 08:10, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> jian he <jian(dot)universality(at)gmail(dot)com>, 27 Oca 2025 Pzt, 04:47 tarihinde şunu yazdı:
>> so the find_all_inheritors output order is deterministic?
>
> You're right that order in find_all_inheritors is deterministic. But it's not always the same with the order of SELECT output. You can quickly see what I mean by running a slightly modified version of the example that you shared in your first email:

I think it's fine to raise the question as to whether the order
changing matters, however, I don't personally think there should be
any concerns with this. The main reason I think this is that the
command isn't the same, so the user shouldn't expect the same
behaviour. They'll need to adjust their commands to get the new
behaviour and possibly a different order.

Another few reasons are:

1) In the subquery version, the Append children are sorted by cost, so
the order isn't that predictable in the first place. (See
create_append_path() -> list_sort(subpaths,
append_total_cost_compare))
2) The order tuples are copied with COPY TO on non-partitioned tables
isn't that well defined in the first place. Two reasons for this, a)
the heap is a heap and has no defined order; and b) sync scans might
be used and the scan might start at any point in the heap and circle
back around again to the page prior to the page where the scan
started. See (table_beginscan() adds SO_ALLOW_SYNC to the flags).

I think the main thing to be concerned about regarding order is to
ensure that all rows from the same partition are copied consecutively,
and that does not seem to be at risk of changing here. This is
important as 3592e0ff9 added caching of the last found partition when
partition lookups continually find the same partition.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2025-02-11 00:26:42 Re: 2025-02-13 release announcement draft
Previous Message David G. Johnston 2025-02-11 00:25:42 Re: describe special values in GUC descriptions more consistently