Re: unlogged tables

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>, dgabriel <gabriel(dot)dodan(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: unlogged tables
Date: 2015-04-13 23:49:08
Message-ID: CAMkU=1z_QRgLcGd1YTHcScbL4GFBcmPsuPxw92qy6HRXtHiscQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Monday, April 13, 2015, Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>
> wrote:
>
>>
>> On Mon, Apr 13, 2015 at 4:31 PM, dgabriel <gabriel(dot)dodan(at)gmail(dot)com>
>> wrote:
>>
>>> "In the event of a normal shutdown, we can flush all the writes to disk
>>> so we know all the data has been written, so there is no need to
>>> truncate."
>>>
>>> Isn't possible to periodically flush data to disk and in case of crush
>>> postgres to load only the data that existed at last flush? The periodic
>>> flush could be configurable, for example every 30 minutes or after x rows
>>> updated/inserted.
>>>
>>
>> There is no such facility implemented for UNLOGGED TABLEs. That could be
>> a feature request though.
>>
>
One way would be to lock dirty buffers from unlogged relations into
shared_buffers (which hardly seems like a good thing) until the start of a
"super-checkpoint" and then write them all out as fast as possible (which
kind of defeats checkpoint_completion_target). And then if the crash
happened during a super-checkpoint, the data would still be inconsistent
and need to be truncated.

>
>>
> Well, that is half right anyway. UNLOGGED tables obey checkpoints just
> like any other table.
>

Do they? I thought they only obeyed shutdown checkpoints, not online
checkpoints. I do remember some changes around this area, but none that
completely reverted that logic.

> The missing feature is an option to leaved restored the last checkpoint.
> Instead, not knowing whether there were changes since the last checkpoint,
> the system truncated the relation.
>
> What use case is there for a behavior that the last checkpoint data is
> left on the relation upon restarting - not knowing whether it was possible
> the other data could have been written subsequent?
>

I would like a way to have unlogged tables be available on a replica
provided that no changes were made to them between the pg_basebackup and
the recovery point.

My use case is that I mark certain read-only-after-bulk-loading tables as
unlogged solely to avoid blowing out the log archive during the loading
phase and refresh phase. This is stuff like vendor catalogs, NCBI
datasets, ChEMBL datasets, etc, which can simply be re-derived from the
reference. It would be nice if these were still available (without having
to repeat the ETL) after crashes provided they were not written to since a
checkpoint, and available on cloned test servers without having to repeat
the ETL on those as well.

As for "maybe its corrupt, maybe it isn't, but lets keep them anyway",
yeah, I have little use for that.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David G. Johnston 2015-04-14 00:32:20 Re: unlogged tables
Previous Message Jim Nasby 2015-04-13 22:38:25 Re: unlogged tables