From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Error with index on unlogged table |
Date: | 2015-12-04 12:57:54 |
Message-ID: | CAB7nPqTJGbRdexanOk2HUXqWU1MDTxv4tSQ9MSQtBCvUJ=m9DQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 4, 2015 at 5:10 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-12-04 17:00:13 +0900, Michael Paquier wrote:
>> Andres Freud wrote:
>> >> extern void InitXLogInsert(void);
>> >> diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
>> >> index ad1eb4b..91445f1 100644
>> >> --- a/src/include/catalog/pg_control.h
>> >> +++ b/src/include/catalog/pg_control.h
>> >> @@ -73,6 +73,7 @@ typedef struct CheckPoint
>> >> #define XLOG_END_OF_RECOVERY 0x90
>> >> #define XLOG_FPI_FOR_HINT 0xA0
>> >> #define XLOG_FPI 0xB0
>> >> +#define XLOG_FPI_FOR_SYNC 0xC0
>> >
>> >
>> > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too
>> > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or
>> > instead adding actual record data and a 'flags' field in there? I
>> > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are
>> > different, XLOG_FPI_FOR_SYNC not so much.
>>
>> Let's go for XLOG_FPI_FLUSH.
>
> I think the other way is a bit better, because we can add new flags
> without changing the WAL format.
Hm. On the contrary, I think that it would make more sense to have a
flag as well for FOR_HINT honestly, those are really the same
operations, and FOR_HINT is just here for statistic purposes.
>> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
>> index 99337b0..b646101 100644
>> --- a/src/backend/access/brin/brin.c
>> +++ b/src/backend/access/brin/brin.c
>> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
>> brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>> BRIN_CURRENT_VERSION);
>> MarkBufferDirty(metabuf);
>> - log_newpage_buffer(metabuf, false);
>> +
>> + /*
>> + * For unlogged relations, this page should be immediately flushed
>> + * to disk after being replayed. This is necessary to ensure that the
>> + * initial on-disk state of unlogged relations is preserved when
>> + * they get reset at the end of recovery.
>> + */
>> + log_newpage_buffer(metabuf, false,
>> + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>> END_CRIT_SECTION();
>
> Maybe write the last sentence as '... as the on disk files are copied
> directly at the end of recovery.'?
Check.
>> @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
>> MAIN_FORKNUM,
>> state->rs_blockno,
>> state->rs_buffer,
>> - true);
>> + true,
>> + false);
>> RelationOpenSmgr(state->rs_new_rel);
>>
>> PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
>> @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
>> MAIN_FORKNUM,
>> state->rs_blockno,
>> page,
>> - true);
>> + true,
>> + false);
>
> Did you verify that that's ok when a unlogged table is clustered/vacuum
> full'ed?
Yep.
>> @@ -181,6 +183,9 @@ xlog_identify(uint8 info)
>> case XLOG_FPI_FOR_HINT:
>> id = "FPI_FOR_HINT";
>> break;
>> + case XLOG_FPI_FLUSH:
>> + id = "FPI_FOR_SYNC";
>> + break;
>> }
>
> Old string.
Yeah, that's now completely removed.
>> --- a/src/backend/access/transam/xloginsert.c
>> +++ b/src/backend/access/transam/xloginsert.c
>> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
>> * If the page follows the standard page layout, with a PageHeader and unused
>> * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
>> * the unused space to be left out from the WAL record, making it smaller.
>> + *
>> + * If 'is_flush' is set to TRUE, relation will be requested to flush
>> + * immediately its data at replay after replaying this full page image.
>> */
>
> s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the
> OS immediately after replaying the record'?
s/OS/stable storage?
--
Michael
Attachment | Content-Type | Size |
---|---|---|
0001-Ensure-consistent-on-disk-state-of-UNLOGGED-indexes-.patch | text/x-patch | 19.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2015-12-04 13:13:58 | Re: Support of partial decompression for datums |
Previous Message | Ildus Kurbangaliev | 2015-12-04 12:47:18 | Support of partial decompression for datums |