From: | Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com> |
Cc: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: In-placre persistance change of a relation |
Date: | 2021-12-17 09:10:30 |
Message-ID: | AM8PR07MB8248217A3AFEC0DD4D59ED83F6789@AM8PR07MB8248.eurprd07.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Kyotaro wrote:
> > In this version, I got rid of the "CLEANUP FORK"s, and added a new
> > system "Smgr marks". The mark files have the name of the
> > corresponding fork file followed by ".u" (which means Uncommitted.).
> > "Uncommited"-marked main fork means the same as the
> CLEANUP2_FORKNUM
> > and uncommitted-marked init fork means the same as the
> CLEANUP_FORKNUM
> > in the previous version.x
> >
> > I noticed that the previous version of the patch still leaves an
> > orphan main fork file after "BEGIN; CREATE TABLE x; ROLLBACK; (crash
> > before checkpoint)" since the "mark" file (or CLEANUP2_FORKNUM) is
> > revmoed at rollback. In this version the responsibility to remove the
> > mark files is moved to SyncPostCheckpoint, where main fork files are
> > actually removed.
>
> For the record, I noticed that basebackup could be confused by the mark files
> but I haven't looked that yet.
>
Good morning Kyotaro,
the patch didn't apply clean (it's from March; some hunks were failing), so I've fixed it and the combined git format-patch is attached. It did conflict with the following:
b0483263dda - Add support for SET ACCESS METHOD in ALTER TABLE
7b565843a94 - Add call to object access hook at the end of table rewrite in ALTER TABLE
9ce346eabf3 - Report progress of startup operations that take a long time.
f10f0ae420 - Replace RelationOpenSmgr() with RelationGetSmgr().
I'm especially worried if I didn't screw up something/forgot something related to the last one (rd->rd_smgr changes), but I'm getting "All 210 tests passed".
Basic demonstration of this patch (with wal_level=minimal):
create unlogged table t6 (id bigint, t text);
-- produces 110GB table, takes ~5mins
insert into t6 select nextval('s1'), repeat('A', 1000) from generate_series(1, 100000000);
alter table t6 set logged;
on baseline SET LOGGED takes: ~7min10s
on patched SET LOGGED takes: 25s
So basically one can - thanks to this patch - use his application (performing classic INSERTs/UPDATEs/DELETEs, so without the need to rewrite to use COPY) and perform literally batch upload and then just switch the tables to LOGGED.
Some more intensive testing also looks good, assuming table prepared to put pressure on WAL:
create unlogged table t_unlogged (id bigint, t text) partition by hash (id);
create unlogged table t_unlogged_h0 partition of t_unlogged FOR VALUES WITH (modulus 4, remainder 0);
[..]
create unlogged table t_unlogged_h3 partition of t_unlogged FOR VALUES WITH (modulus 4, remainder 3);
Workload would still be pretty heavy on LWLock/BufferContent,WALInsert and Lock/extend .
t_logged.sql = insert into t_logged select nextval('s1'), repeat('A', 1000) from generate_series(1, 1000); # according to pg_wal_stats.wal_bytes generates ~1MB of WAL
t_unlogged.sql = insert into t_unlogged select nextval('s1'), repeat('A', 1000) from generate_series(1, 1000); # according to pg_wal_stats.wal_bytes generates ~3kB of WAL
so using: pgbench -f <tabletypetest>.sql -T 30 -P 1 -c 32 -j 3 t
with synchronous_commit =ON(default):
with t_logged.sql: tps = 229 (lat avg = 138ms)
with t_unlogged.sql tps = 283 (lat avg = 112ms) # almost all on LWLock/WALWrite
with synchronous_commit =OFF:
with t_logged.sql: tps = 413 (lat avg = 77ms)
with t_unloged.sql: tps = 782 (lat avg = 40ms)
Afterwards switching the unlogged ~16GB partitions takes 5s per partition.
As the thread didn't get a lot of traction, I've registered it into current commitfest https://commitfest.postgresql.org/36/3461/ with You as the author and in 'Ready for review' state.
I think it behaves as almost finished one and apparently after reading all those discussions that go back over 10years+ time span about this feature, and lot of failed effort towards wal_level=noWAL I think it would be nice to finally start getting some of that of it into the core.
-Jakub Wartak.
Attachment | Content-Type | Size |
---|---|---|
v7-0001-In-place-table-persistence-change-with-new-comman.patch | application/octet-stream | 84.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2021-12-17 09:17:09 | Re: Transparent column encryption |
Previous Message | Peter Eisentraut | 2021-12-17 09:06:05 | Re: [PoC] Delegating pg_ident to a third party |