Re: unlogged tables

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>, dgabriel <gabriel(dot)dodan(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: unlogged tables
Date: 2015-04-14 00:32:20
Message-ID: CAKFQuwZhXngoQ9xG9N0fvQYVTvXQoO4cO8r-ZYHN+xLWiJEqCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Apr 13, 2015 at 4:49 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston <
> david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
>> On Monday, April 13, 2015, Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>
>> wrote:
>>
>>>
>>> On Mon, Apr 13, 2015 at 4:31 PM, dgabriel <gabriel(dot)dodan(at)gmail(dot)com>
>>> wrote:
>>>
>>>> "In the event of a normal shutdown, we can flush all the writes to disk
>>>> so we know all the data has been written, so there is no need to
>>>> truncate."
>>>>
>>>> Isn't possible to periodically flush data to disk and in case of crush
>>>> postgres to load only the data that existed at last flush? The periodic
>>>> flush could be configurable, for example every 30 minutes or after x
>>>> rows
>>>> updated/inserted.
>>>>
>>>
>>> There is no such facility implemented for UNLOGGED TABLEs. That could be
>>> a feature request though.
>>>
>>
> One way would be to lock dirty buffers from unlogged relations into
> shared_buffers (which hardly seems like a good thing) until the start of a
> "super-checkpoint" and then write them all out as fast as possible (which
> kind of defeats checkpoint_completion_target). And then if the crash
> happened during a super-checkpoint, the data would still be inconsistent
> and need to be truncated.
>
>
>>
>>>
>> Well, that is half right anyway. UNLOGGED tables obey checkpoints just
>> like any other table.
>>
>
> Do they? I thought they only obeyed shutdown checkpoints, not online
> checkpoints. I do remember some changes around this area, but none that
> completely reverted that logic.
>
>
​I vaguely recall that conversation now...I'm not positive on the exact
mechanics here and, as it pertains to the OP, the difference you describe
is immaterial since in either case the status quo mandates an "all or
nothing" approach to an unlogged table's contents.​

>
>
>> The missing feature is an option to leaved restored the last checkpoint.
>> Instead, not knowing whether there were changes since the last checkpoint,
>> the system truncated the relation.
>>
>> What use case is there for a behavior that the last checkpoint data is
>> left on the relation upon restarting - not knowing whether it was possible
>> the other data could have been written subsequent?
>>
>
> I would like a way to have unlogged tables be available on a replica
> provided that no changes were made to them between the pg_basebackup and
> the recovery point.
>

> My use case is that I mark certain read-only-after-bulk-loading tables as
> unlogged solely to avoid blowing out the log archive during the loading
> phase and refresh phase. This is stuff like vendor catalogs, NCBI
> datasets, ChEMBL datasets, etc, which can simply be re-derived from the
> reference. It would be nice if these were still available (without having
> to repeat the ETL) after crashes provided they were not written to since a
> checkpoint, and available on cloned test servers without having to repeat
> the ETL on those as well.
>
>

​My gut reaction is that those should be in their own clusters and accessed
via postgres_fdw...

That particular use-case would probably best be served with a separate
replication channel which pushes data files from the primary to the slaves
and allows for the slave to basically "rewrite" its existing table by
pointing to the newly supplied version. Some kind of "CREATE STATIC TABLE"
and "PUSH STATIC TABLE TO {all | replica name}" command combo...though
ideally with less manual intervention...

David J.​

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim Nasby 2015-04-14 02:45:21 Re: unlogged tables
Previous Message Jeff Janes 2015-04-13 23:49:08 Re: unlogged tables