Re: Error with index on unlogged table

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Error with index on unlogged table
Date: 2015-12-04 12:57:54
Message-ID: CAB7nPqTJGbRdexanOk2HUXqWU1MDTxv4tSQ9MSQtBCvUJ=m9DQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 4, 2015 at 5:10 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-12-04 17:00:13 +0900, Michael Paquier wrote:
>> Andres Freud wrote:
>> >> extern void InitXLogInsert(void);
>> >> diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
>> >> index ad1eb4b..91445f1 100644
>> >> --- a/src/include/catalog/pg_control.h
>> >> +++ b/src/include/catalog/pg_control.h
>> >> @@ -73,6 +73,7 @@ typedef struct CheckPoint
>> >> #define XLOG_END_OF_RECOVERY 0x90
>> >> #define XLOG_FPI_FOR_HINT 0xA0
>> >> #define XLOG_FPI 0xB0
>> >> +#define XLOG_FPI_FOR_SYNC 0xC0
>> >
>> >
>> > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too
>> > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or
>> > instead adding actual record data and a 'flags' field in there? I
>> > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are
>> > different, XLOG_FPI_FOR_SYNC not so much.
>>
>> Let's go for XLOG_FPI_FLUSH.
>
> I think the other way is a bit better, because we can add new flags
> without changing the WAL format.

Hm. On the contrary, I think that it would make more sense to have a
flag as well for FOR_HINT honestly, those are really the same
operations, and FOR_HINT is just here for statistic purposes.

>> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
>> index 99337b0..b646101 100644
>> --- a/src/backend/access/brin/brin.c
>> +++ b/src/backend/access/brin/brin.c
>> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
>> brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>> BRIN_CURRENT_VERSION);
>> MarkBufferDirty(metabuf);
>> - log_newpage_buffer(metabuf, false);
>> +
>> + /*
>> + * For unlogged relations, this page should be immediately flushed
>> + * to disk after being replayed. This is necessary to ensure that the
>> + * initial on-disk state of unlogged relations is preserved when
>> + * they get reset at the end of recovery.
>> + */
>> + log_newpage_buffer(metabuf, false,
>> + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>> END_CRIT_SECTION();
>
> Maybe write the last sentence as '... as the on disk files are copied
> directly at the end of recovery.'?

Check.

>> @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
>> MAIN_FORKNUM,
>> state->rs_blockno,
>> state->rs_buffer,
>> - true);
>> + true,
>> + false);
>> RelationOpenSmgr(state->rs_new_rel);
>>
>> PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
>> @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
>> MAIN_FORKNUM,
>> state->rs_blockno,
>> page,
>> - true);
>> + true,
>> + false);
>
> Did you verify that that's ok when a unlogged table is clustered/vacuum
> full'ed?

Yep.

>> @@ -181,6 +183,9 @@ xlog_identify(uint8 info)
>> case XLOG_FPI_FOR_HINT:
>> id = "FPI_FOR_HINT";
>> break;
>> + case XLOG_FPI_FLUSH:
>> + id = "FPI_FOR_SYNC";
>> + break;
>> }
>
> Old string.

Yeah, that's now completely removed.

>> --- a/src/backend/access/transam/xloginsert.c
>> +++ b/src/backend/access/transam/xloginsert.c
>> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
>> * If the page follows the standard page layout, with a PageHeader and unused
>> * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
>> * the unused space to be left out from the WAL record, making it smaller.
>> + *
>> + * If 'is_flush' is set to TRUE, relation will be requested to flush
>> + * immediately its data at replay after replaying this full page image.
>> */
>
> s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the
> OS immediately after replaying the record'?

s/OS/stable storage?
--
Michael

Attachment Content-Type Size
0001-Ensure-consistent-on-disk-state-of-UNLOGGED-indexes-.patch text/x-patch 19.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-12-04 13:13:58 Re: Support of partial decompression for datums
Previous Message Ildus Kurbangaliev 2015-12-04 12:47:18 Support of partial decompression for datums