Re: Memory leak in GIN index build

From: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Marc Cousin <marc(dot)cousin(at)dalibo(dot)com>
Subject: Re: Memory leak in GIN index build
Date: 2016-04-16 21:56:25
Message-ID: 5712B509.7040403@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16/04/2016 20:45, Tom Lane wrote:
> Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> writes:
>
>> Also, in dataPlaceToPageLeaf() and ginVacuumPostingTreeLeaf(), shouldn't
>> the START_CRIT_SECTION() calls be placed before the xlog code?
>
> Yeah, they should. Evidently somebody kluged it to avoid doing a palloc
> inside a critical section, while ignoring every other rule about where to
> place critical sections. (Maybe we should put an assert about being
> within a critical section into XLogBeginInsert.)
>

After a quick test, it appears that at least log_smgrcreate() calls
XLogBeginInsert() without being in a critical section, from the various
DDL commands.

> This code is pretty fundamentally broken, isn't it. elog() calls
> inside a critical section? Really? Even worse, a MemoryContextDelete,
> which hardly seems like something that should be assumed error-proof.
>
> Not to mention the unbelievable ugliness of a function that sometimes
> returns with an open critical section (and WAL insertion started) and
> sometimes doesn't.
>
> It kinda looks like this was hacked up without bothering to keep the
> comment block in ginPlaceToPage in sync with reality, either.
>
> I think this needs to be redesigned so that the critical section and WAL
> insertion calls all happen within a single straight-line piece of code.
>
> We could try making that place be ginPlaceToPage(). So far as
> registerLeafRecompressWALData is concerned, that does not look that hard;
> it could palloc and fill the WAL-data buffer the same as it's doing now,
> then pass it back up to be registered (and eventually pfreed) in
> ginPlaceToPage. However, that would also mean postponing the call of
> dataPlaceToPageLeafRecompress(), which seems much messier. I assume
> the data it's working from is mostly in the tmpCtx that
> dataPlaceToPageLeaf wants to free before returning. Maybe we could
> move creation/destruction of the tmpCtx out to ginPlaceToPage, as well?
>
> The other line of thought is to move the WAL work that ginPlaceToPage
> does down into the subroutines. That would probably result in some
> duplication of code, but it might end up being a cleaner solution.
>
> Anyway, I think memory leakage is just the start of what's broken here,
> and fixing it won't be a very small patch.
>

I'm not really confident, but I'll give a try. Probably with your
second solution which looks easier.

Any pointer is welcome, unless someone more familiar with this code
wants to take it.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-04-16 22:04:39 Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Previous Message Tom Lane 2016-04-16 21:52:44 Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <