Re: Disable WAL logging to speed up data loading

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disable WAL logging to speed up data loading
Date: 2020-11-09 15:18:08
Message-ID: 20201109151807.GG16415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* osumi(dot)takamichi(at)fujitsu(dot)com (osumi(dot)takamichi(at)fujitsu(dot)com) wrote:
> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I'm not sure that wal_level=none is really the right way to address this
> > use-case. We already have unlogged tables and that's pretty clean and
> > meets the "we want to load data fast without having to pay for WAL" use case.
> > The argument here seems to be that to take advantage of unlogged tables
> > requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> > TABLE' command instead of a 'CREATE TABLE' command. That doesn't
> > seem like a huge leap, but we could make it easier by just adding a
> > 'table_log_default' or such GUC that could be set on the data loading role to
> > have all tables created by it be unlogged.
> I'm afraid to say that in the case to setup all tables as unlogged,
> the user are forced to be under tension to
> back up *all* commands from application, in preparation for unexpected crash.
> This is because whenever the server crashes,
> the unlogged tables are truncated and the DBA needs to
> input the processings after the last backup again without exception.
> I didn't think that this was easy and satisfied the user.

You'll need to explain how this is different from the proposed
'wal_level = none' option, since it sounded like that would be exactly
the same case..?

> In addition, as long as the tables are unlogged, the user cannot be released from
> this condition or (requirement ?) to back up all commands or
> to guarantee that all commands are repeatable for the DBA.

They can change the table to be logged though, if they wish to.

> When I consider the use case is the system of data warehouse
> as described upthread, the size of each table can be large.
> Thus, changing the status from unlogged to logged (recoverable)
> takes much time under the current circumstances, which was discussed before.

Ok- so the issue is that, today, we dump all of the table into the WAL
when we go from unlogged to logged, but as I outlined previously,
perhaps that's because we're missing a trick there when
wal_level=minimal. If wal_level=minimal, then it would seem like we
could lock the table, then sync it and then mark is as logged, which is
more-or-less what you're asking to have be effectively done with the
proposed wal_level=none, but this would be an optimization for all
existing users of wal_level=minimal who have unlogged tables that they
want to change to logged, and this works on a per-table basis instead,
which seems like a better approach than a cluster-wide setting.

> By having the limited window of time,
> during wal_level=none, I'd like to make wal_level=none work to
> localize and minimize the burden to guarantee all commands are
> repeatable. To achieve this, after switching wal_level from none to higher ones,
> the patch must ensure crash recovery, though.

Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
to marked a bunch of unlogged tables over to being logged would be good
to add too.

> Sorry that my current patch doesn't complete this aspect fully at present
> but, may I have your opinion about this ?

Presently, my feeling is that we could address this use-case without
having to introduce a new cluster-wide WAL level, and that's the
direction I'd want to see this going. Perhaps I'm missing something
about why the approach I've set forth above wouldn't work, and
wal_level=none would, but I've not seen it yet.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-09 15:25:24 Re: -Wformat-signedness
Previous Message Alvaro Herrera 2020-11-09 15:11:43 Re: Prevent printing "next step instructions" in initdb and pg_upgrade