Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-02-05 17:43:44
Message-ID: CA+TgmoZPbfrrEMmHLptDK9tTdhJPB=yEDFDurgMB11cYm4LviQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 2, 2018 at 10:26 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> My proposed commit message
> has a full explanation of the Valgrind issue, which I won't repeat
> here. Go read it before reading the rest of this e-mail.

I'm going to paste the first two sentences of your proposed commit
message in here for the convenience of other readers, since I want to
reply to them.

# LogicalTapeFreeze() may write out its first block when it is dirty but
# not full, and then immediately read the first block back in from its
# BufFile as a BLCKSZ-width block. This can only occur in rare cases
# where next to no tuples were written out, which is only possible with
# parallel external tuplesorts.

So, if I understand correctly what you're saying here, valgrind is
totally cool with us writing out an only-partially-initialized block
to a disk file, but it's got a real problem with us reading that data
back into the same memory space it already occupies. That's a little
odd. I presume that it's common for the tail of the final block
written to be uninitialized, but normally when we then go read block
0, that's some other, fully initialized block.

It seems like it would be pretty easy to just suppress the useless
read when we've already got the correct data, and I'd lean toward
going that direction since it's a valid optimization anyway. But I'd
like to hear some opinions from people who use and think about
valgrind more than I do (Tom, Andres, Noah, ...?).

> It might seem like my suppression is overly broad, or not broad
> enough, since it essentially targets LogicalTapeFreeze(). I don't
> think it is, though, because this can occur in two places within
> LogicalTapeFreeze() -- it can occur in the place we actually saw the
> issue on lousyjack, from the ltsReadBlock() call within
> LogicalTapeFreeze(), as well as a second place -- when
> BufFileExportShared() is called. I found that you have to tweak code
> to prevent it happening in the first place before you'll see it happen
> in the second place.

I don't quite see how that would happen, because BufFileExportShared,
at least AFAICS, doesn't touch the buffer?

Unfortunately valgrind does not work at all on my laptop -- the server
appears to start, but as soon as you try to connect, the whole thing
dies with an error claiming that the startup process has failed. So I
can't easily test this at the moment. I'll try to get it working,
here or elsewhere, but thought I'd send the above reply first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2018-02-05 17:55:46 Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Previous Message Petr Jelinek 2018-02-05 15:33:14 Re: BUG #15044: materialized views incompatibility with logical replication in postgres 10