RE: Disable WAL logging to speed up data loading

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Robert Haas' <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Disable WAL logging to speed up data loading
Date: 2020-11-17 01:45:53
Message-ID: TYAPR01MB2990128F309EF1CBD16EFB4DFEE20@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Robert Haas <robertmhaas(at)gmail(dot)com>
> I'm also concerned about the way that this proposed feature interacts with
> incremental backup capabilities that already exist in tools like pgBackRest,
> EDB's BART, pg_probackup, and future things we might want to introduce into
> core, along the lines of what I have previously proposed. Now, I think
> pgBackRest uses only timestamps and checksums, so it probably doesn't care,
> but some of the other solutions rely on WAL-scanning to gather a list of
> changed blocks. I guess there's no reason that they can't notice the wal_level
> being changed and do the right thing; they should probably have that kind of
> capability already. Still, it strikes me that it might be useful if we had a stronger
> mechanism.

Having a quick look, those backup tools seem to require setting wal_level to replica or higher. That's no wonder, because recovering the database needs WAL for non-relation resources such as pg_control and relation map. So, I think wal_level = none won't introduce new issues (compared to wal_level = minimal, which also can lack WAL records for some data updates.)

> By the way, another problem here is that some AMs - e.g. GiST, IIRC - use LSNs
> to figure out whether a block has changed. For temporary and unlogged tables,
> we use "fake" LSNs that are generated using a counter, but that approach only
> works because such relations are never really WAL-logged. Mixing fake LSNs
> and real LSNs will break stuff, and not bumping the LSN when the page
> changes probably will, too.

Unlogged GiST indexes use fake LSNs that are instance-wide. Unlogged temporary GiST indexes use backend-local sequence values. Other unlogged and temporary relations don't set LSNs on pages. So, I think it's enough to call GetFakeLSNForUnloggedRel() when wal_level = none as well.

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-11-17 02:02:19 Re: Tab complete for CREATE OR REPLACE TRIGGER statement
Previous Message Michael Paquier 2020-11-17 01:32:16 Re: Skip ExecCheckRTPerms in CTAS with no data