From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, smithpb2250(at)gmail(dot)com, vignesh21(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, jakub(dot)wartak(at)enterprisedb(dot)com, stark(dot)cfm(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, barwick(at)gmail(dot)com, jchampion(at)timescale(dot)com, pryzby(at)telsasoft(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, rjuju123(at)gmail(dot)com, jakub(dot)wartak(at)tomtom(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: In-placre persistance change of a relation |
Date: | 2024-06-04 00:09:12 |
Message-ID: | Zl5bKIdOWiHHb-9N@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, May 28, 2024 at 04:49:45PM -0700, Jelte Fennema-Nio wrote:
> Two notes after looking at this quickly during the advanced patch
> feedback session:
>
> 1. I would maybe split 0003 into two separate patches. One to make SET
> UNLOGGED fast, which seems quite easy to do because no WAL is needed.
> And then a follow up to make SET LOGGED fast, which does all the
> XLOG_FPI stuff.
Yeah, that would make sense. The LOGGED->UNLOGGED part is
straight-forward because we only care about the init fork. The
UNLOGGED->LOGGED case bugs me, though, a lot.
> 2. When wal_level = minitam, still some WAL logging is needed. The
> pages that were changed since the last still need to be made available
> for crash recovery.
More notes from me, as I was part of this session.
+ * XXXX: Some access methods don't support in-place persistence
+ * changes. GiST uses page LSNs to figure out whether a block has been
[...]
+ if (r->rd_rel->relkind == RELKIND_INDEX &&
+ /* GiST is excluded */
+ r->rd_rel->relam != BTREE_AM_OID &&
+ r->rd_rel->relam != HASH_AM_OID &&
+ r->rd_rel->relam != GIN_AM_OID &&
+ r->rd_rel->relam != SPGIST_AM_OID &&
+ r->rd_rel->relam != BRIN_AM_OID)
This knowledge should not be encapsulated in the backend code. The
index AMs should be able to tell, instead, if they are able to support
this code path so as any out-of-core index AM can decide things on its
own. This ought to be split in its own patch, simple enough as of a
boolean or a routine telling how this backend path should behave.
+ for (fork = 0; fork < INIT_FORKNUM; fork++)
+ {
+ if (smgrexists(RelationGetSmgr(r), fork))
+ log_newpage_range(r, fork, 0,
+ smgrnblocks(RelationGetSmgr(r), fork),
+ false);
+ }
A simple copy of the blocks means that we keep anything bloated in
them, while a rewrite in ALTER TABLE means that we would start afresh
by deforming the tuples from the origin before giving them to the
target, without any bloat. The compression of the FPWs and the
removal of the holes in the pages would surely limit the impact, but
this has not been discussed on this thread, and this is a nice
property of the existing implementation that would get silently
removed by this patch set.
Another point that Nathan has made is that it may be more appealling
to study how this is better than an integration with the multi-INSERT
APIs into AMs, so as it is possible to group the inserts in batches
rather than process them one-at-a-time, see [1]. I am ready to accept
that what this patch does is more efficient as long as everything is
block-based in some cases. Still there is a risk-vs-gain argument
here, and I am not sure whether what we have here is a good tradeoff
compared to the potential risk of breaking things. The amount of new
infrastructure is large for this code path. Grouping the inserts in
large batches may finish by being more efficient than a WAL stream
full of FPWs, as well, even if toast values are deformed? So perhaps
there is an argument for making that optional at query level, instead.
As a hole, I can say that grouping the INSERTs will be always more
efficient, while what we have here can be less efficient in some
cases. I'm OK to be outvoted, but the level of complications created
by this block-based copy and WAL-logging concerns me when it comes to
tweaking the relpersistence like that.
[1]: https://commitfest.postgresql.org/48/4777/
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2024-06-04 00:11:07 | Re: Proposal: Document ABI Compatibility |
Previous Message | Michael Paquier | 2024-06-03 23:30:19 | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows |