Re: mdnblocks() sabotages error checking in _mdfd_getseg()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Date: 2015-12-15 14:51:51
Message-ID: 20151215145151.GC26501@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-12-09 16:50:06 -0500, Robert Haas wrote:
> Now, if subsequent to this an index scan happens to sweep through and
> try to fetch a block in 123456.2, it will work! This happens because
> _mdfd_getseg() doesn't care about the length of the segments; it only
> cares whether or not they exist.

Unless in recovery in the startup process, or when EXTENSION_CREATE is
passed to it. Depending on whether it or mdnblocks were called first,
and depending on which segment is missing. In that case it'd *possibly*
pad the last block in a sgment with zeroes, Yes, only the last block:
*
* We have to maintain the invariant that segments before the last
* active segment are of size RELSEG_SIZE; therefore, pad them out
* with zeroes if needed. (This only matters if caller is
* extending the relation discontiguously, but that can happen in
* hash indexes.)
*/
if (behavior == EXTENSION_CREATE || InRecovery)
{
if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE)
{
char *zerobuf = palloc0(BLCKSZ);

mdextend(reln, forknum,
nextsegno * ((BlockNumber) RELSEG_SIZE) - 1,
zerobuf, skipFsync);
pfree(zerobuf);
}
v->mdfd_chain = _mdfd_openseg(reln, forknum, +nextsegno, O_CREAT);
}

> If 123456.1 were actually missing,
> then we'd never test whether 123456.2 exists and we'd get an error.
> But because mdnblocks() created 123456.1, _mdfd_getseg() is now quite
> happy to fetch blocks in 123456.2; it considers the empty 123456.1
> file to be a sufficient reason to look for 123456.2, and seeing that
> file, and finding the block it wants therein, it happily returns that
> block, blithely ignoring the fact that it passed over a completely .1
> segment before returning a block from .2. This is maybe not the
> smartest thing ever.

I think it might be worthwhile to add a check against this. It'd not be
very hard to ensure the segment is indeed large enough iff further
segments exist. Possibly not an ERROR, but a WARNING might be useful. To
avoid overhead and absurd log spamming we should make that check only if
the the segment isn't already open.

> The comment in mdnblocks.c says this:
>
> * Because we pass O_CREAT, we will create the
> next segment (with
> * zero length) immediately, if the last
> segment is of length
> * RELSEG_SIZE. While perhaps not strictly
> necessary, this keeps
> * the logic simple.
>
> I don't really see how this "keeps the logic simple". What it does is
> allow sequential scans and index scans to have two completely
> different notions of how many accessible blocks there are in the
> relation. Granted, this kind of thing really shouldn't happen, but
> sometimes bad things do happen. Therefore, I propose the attached
> patch.

I'm in favor of changing this behaviour - it's indeed weird and somewhat
error prone.

Do you want to change this in master, or backpatch a change? To me the
code in md.c is subtle and complicated enough that I'm rather inclined
to only change this in master.

This code is really hard to test effectively :(.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-15 15:53:58 Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Previous Message Andres Freund 2015-12-15 14:17:34 Re: Fixing warnings in back branches?