Re: In-placre persistance change of a relation

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, michael(at)paquier(dot)xyz
Cc: nathandbossart(at)gmail(dot)com, postgres(at)jeltef(dot)nl, smithpb2250(at)gmail(dot)com, vignesh21(at)gmail(dot)com, jakub(dot)wartak(at)enterprisedb(dot)com, stark(dot)cfm(at)gmail(dot)com, barwick(at)gmail(dot)com, jchampion(at)timescale(dot)com, pryzby(at)telsasoft(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, rjuju123(at)gmail(dot)com, jakub(dot)wartak(at)tomtom(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Subject: Re: In-placre persistance change of a relation
Date: 2024-09-01 19:15:00
Message-ID: b0f3fa46-ba5e-4353-b060-56e9eea1214f@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31/08/2024 19:09, Kyotaro Horiguchi wrote:
> - UNDO log(0002): This handles file deletion during transaction aborts,
> which was previously managed, in part, by the commit XLOG record at
> the end of a transaction.
>
> - Prevent orphan files after a crash (0005): This is another use-case
> of the UNDO log system.

Nice, I'm very excited if we can fix that long-standing issue! I'll try
to review this properly later, but at a quick 5 minute glance, one thing
caught my eye:

This requires fsync()ing the per-xid undo log file every time a relation
is created. I fear that can be a pretty big performance hit for
workloads that repeatedly create and drop small tables. Especially if
they're otherwise running with synchronous_commit=off. Instead of
flushing the undo log file after every write, I'd suggest WAL-logging
the undo log like regular relations and SLRUs. So before writing the
entry to the undo log, WAL-log it. And with a little more effort, you
could postpone creating the files altogether until a checkpoint happens,
similar to how twophase state files are checkpointed nowadays.

I wonder if the twophase state files and undo log files should be merged
into one file. They're similar in many ways: there's one file per
transaction, named using the XID. I haven't thought this fully through,
just a thought..

> +static void
> +undolog_set_filename(char *buf, TransactionId xid)
> +{
> + snprintf(buf, MAXPGPATH, "%s/%08x", SIMPLE_UNDOLOG_DIR, xid);
> +}

I'd suggest using FullTransactionId. Doesn't matter much, but seems like
a good future-proofing.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-09-01 19:30:27 Re: scalability bottlenecks with (many) partitions (and more)
Previous Message PG Bug reporting form 2024-09-01 19:00:01 BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()