Re: Memory leak in GIN index build

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Marc Cousin <marc(dot)cousin(at)dalibo(dot)com>
Subject: Re: Memory leak in GIN index build
Date: 2016-04-18 14:33:04
Message-ID: 23849.1460989984@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> writes:
> On 16/04/2016 20:45, Tom Lane wrote:
>> I think this needs to be redesigned so that the critical section and WAL
>> insertion calls all happen within a single straight-line piece of code.
>>
>> We could try making that place be ginPlaceToPage(). So far as
>> registerLeafRecompressWALData is concerned, that does not look that hard;
>> it could palloc and fill the WAL-data buffer the same as it's doing now,
>> then pass it back up to be registered (and eventually pfreed) in
>> ginPlaceToPage. However, that would also mean postponing the call of
>> dataPlaceToPageLeafRecompress(), which seems much messier. I assume
>> the data it's working from is mostly in the tmpCtx that
>> dataPlaceToPageLeaf wants to free before returning. Maybe we could
>> move creation/destruction of the tmpCtx out to ginPlaceToPage, as well?
>>
>> The other line of thought is to move the WAL work that ginPlaceToPage
>> does down into the subroutines. That would probably result in some
>> duplication of code, but it might end up being a cleaner solution.

> I'm not really confident, but I'll give a try. Probably with your
> second solution which looks easier.

I poked at this over the weekend, and got more unhappy the more I poked.
Aside from the memory leakage issue, there are multiple coding-rule
violations besides the one you noted about scope of the critical sections.
One example is that in the page-split path for an inner page, we modify
the contents of childbuf long before getting into the critical section.
The number of out-of-date comments was depressingly large as well.

I ended up deciding that the most reasonable fix was along the lines of
my first alternative above. In the attached draft patches, the
"placeToPage" method is split into two methods, "beginPlaceToPage" and
"execPlaceToPage", where the second method is what's supposed to happen
inside the critical section for a non-page-splitting update. Nothing
that's done inside beginPlaceToPage is critical.

I've tested this to the extent of verifying that it passes make
check-world and stops the memory leak in your test case, but it could use
more eyeballs on it.

Attached are draft patches against HEAD and 9.5 (they're the same except
for BufferGetPage calls). I think we'd also want to back-patch to 9.4,
but that will take considerable additional work because of the XLOG API
rewrite that happened in 9.5. I also noted that some of the coding-rule
violations seem to be new in 9.5, so the problems may be less severe in
9.4 --- the memory leak definitely exists there, though.

regards, tom lane

Attachment Content-Type Size
ginplacetopage-fixes-head.patch text/x-diff 46.8 KB
ginplacetopage-fixes-9.5.patch text/x-diff 46.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-04-18 14:37:25 Re: pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner.
Previous Message Andres Freund 2016-04-18 14:27:50 Re: pgsql: Allow Pin/UnpinBuffer to operate in a lockfree manner.