Re: unlogged tables

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>, dgabriel <gabriel(dot)dodan(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: unlogged tables
Date: 2015-04-14 17:37:19
Message-ID: CAMkU=1xartQjvQ3aO2YGrmDr5wqwxuD4_uWFZy5zfOLjgh=MJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Apr 13, 2015 at 8:28 PM, David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Mon, Apr 13, 2015 at 7:45 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
> wrote:
>
>>
>>
>> There's been recent discussion of adding support for read-only tables. If
>> we had those, we might be able to support something like...
>>
>> INSERT INTO unlogged;
>> ALTER TABLE unlogged SET READ ONLY;
>> CHECKPOINT;
>> /* take backup */
>>
>> This should be safe as long as we WAL log changes to read-only status
>> (which presumably we would).
>>
>> How much work that would entail though, I don't know.
>>
>
Right. I've been keeping an eye on that discussion with the same
intention. The big question is how, during recovery, does it know what
state the table was in without being able to read from the system
catalogs? Perhaps it would be the checkpointer's duty at the end of the
checkpoint to remove the init fork for unlogged relations which were turned
to read only before that checkpoint started.

>
>> Ultimately you still have to get the data over to the other machine
>> anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL
>> logging of bulk inserts (and especially COPY into a known empty table) a
>> lot more efficient.
>>
>>
> ​Jeff Janes makes a comment about wanting "...to avoid blowing out the
> log archive..."; which I also don't quite follow...
>

I think the WAL logging of bulk COPY is pretty space-efficient already,
provided it is not indexed at the time of the COPY. But no amount of
efficiency improvement is going to make them small enough for me want to
keep the WAL logs around beyond the next base backup.

What I would really want is a way to make two separate WAL streams; changes
to this set of tables goes to the "keep forever, for PITR" stream, and
changes to this other set of tables go to the "keep until pg_basebackup is
next run" stream. Of course you couldn't have fk constraints between the
two different sets of tables.

Having to get the data over to the other machine doesn't bother me, it is
just a question of how to do it without permanently intermingling it with
WAL logs which I want to keep forever.

The FDW would be a good option, except the overhead (both execution
overhead and the overhead of poor plans) seems to be too large. I haven't
explored it as much as I would like.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2015-04-14 19:58:32 Re: Some performance testing?
Previous Message Jeff Janes 2015-04-14 16:58:46 Re: unlogged tables