From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Disable WAL logging to speed up data loading |
Date: | 2020-11-09 15:18:08 |
Message-ID: | 20201109151807.GG16415@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* osumi(dot)takamichi(at)fujitsu(dot)com (osumi(dot)takamichi(at)fujitsu(dot)com) wrote:
> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I'm not sure that wal_level=none is really the right way to address this
> > use-case. We already have unlogged tables and that's pretty clean and
> > meets the "we want to load data fast without having to pay for WAL" use case.
> > The argument here seems to be that to take advantage of unlogged tables
> > requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> > TABLE' command instead of a 'CREATE TABLE' command. That doesn't
> > seem like a huge leap, but we could make it easier by just adding a
> > 'table_log_default' or such GUC that could be set on the data loading role to
> > have all tables created by it be unlogged.
> I'm afraid to say that in the case to setup all tables as unlogged,
> the user are forced to be under tension to
> back up *all* commands from application, in preparation for unexpected crash.
> This is because whenever the server crashes,
> the unlogged tables are truncated and the DBA needs to
> input the processings after the last backup again without exception.
> I didn't think that this was easy and satisfied the user.
You'll need to explain how this is different from the proposed
'wal_level = none' option, since it sounded like that would be exactly
the same case..?
> In addition, as long as the tables are unlogged, the user cannot be released from
> this condition or (requirement ?) to back up all commands or
> to guarantee that all commands are repeatable for the DBA.
They can change the table to be logged though, if they wish to.
> When I consider the use case is the system of data warehouse
> as described upthread, the size of each table can be large.
> Thus, changing the status from unlogged to logged (recoverable)
> takes much time under the current circumstances, which was discussed before.
Ok- so the issue is that, today, we dump all of the table into the WAL
when we go from unlogged to logged, but as I outlined previously,
perhaps that's because we're missing a trick there when
wal_level=minimal. If wal_level=minimal, then it would seem like we
could lock the table, then sync it and then mark is as logged, which is
more-or-less what you're asking to have be effectively done with the
proposed wal_level=none, but this would be an optimization for all
existing users of wal_level=minimal who have unlogged tables that they
want to change to logged, and this works on a per-table basis instead,
which seems like a better approach than a cluster-wide setting.
> By having the limited window of time,
> during wal_level=none, I'd like to make wal_level=none work to
> localize and minimize the burden to guarantee all commands are
> repeatable. To achieve this, after switching wal_level from none to higher ones,
> the patch must ensure crash recovery, though.
Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
to marked a bunch of unlogged tables over to being logged would be good
to add too.
> Sorry that my current patch doesn't complete this aspect fully at present
> but, may I have your opinion about this ?
Presently, my feeling is that we could address this use-case without
having to introduce a new cluster-wide WAL level, and that's the
direction I'd want to see this going. Perhaps I'm missing something
about why the approach I've set forth above wouldn't work, and
wal_level=none would, but I've not seen it yet.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-11-09 15:25:24 | Re: -Wformat-signedness |
Previous Message | Alvaro Herrera | 2020-11-09 15:11:43 | Re: Prevent printing "next step instructions" in initdb and pg_upgrade |