From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GIN improvements part 1: additional information |
Date: | 2014-01-17 11:05:08 |
Message-ID: | CAPpHfds0A03zwmYdhKUg6s01Vq1p30n01=3c-tct264QTpV--Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 15, 2014 at 10:47 AM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com>wrote:
> On Wed, Jan 15, 2014 at 5:17 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>
>> On 14.1.2014 00:38, Tomas Vondra wrote:
>> > On 13.1.2014 18:07, Alexander Korotkov wrote:
>> >> On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
>> >> <mailto:tv(at)fuzzy(dot)cz>> wrote:
>> >>
>> >> On 8.1.2014 22:58, Alexander Korotkov wrote:
>> >> > Thanks for reporting. Fixed version is attached.
>> >>
>> >> I've tried to rerun the 'archie' benchmark with the current patch,
>> and
>> >> once again I got
>> >>
>> >> PANIC: could not split GIN page, didn't fit
>> >>
>> >> I reran it with '--enable-cassert' and with that I got
>> >>
>> >> TRAP: FailedAssertion("!(ginCompareItemPointers(&items[i - 1],
>> >> &items[i]) < 0)", File: "gindatapage.c", Line:
>> 149)
>> >> LOG: server process (PID 5364) was terminated by signal 6: Aborted
>> >> DETAIL: Failed process was running: INSERT INTO messages ...
>> >>
>> >> so the assert in GinDataLeafPageGetUncompressed fails for some
>> reason.
>> >>
>> >> I can easily reproduce it, but my knowledge in this area is rather
>> >> limited so I'm not entirely sure what to look for.
>> >>
>> >>
>> >> I've fixed this bug and many other bug. Now patch passes test suite
>> that
>> >> I've used earlier. The results are so:
>> >
>> > OK, it seems the bug is gone. However now there's a memory leak
>> > somewhere. I'm loading pgsql mailing list archives (~600k messages)
>> > using this script
>> >
>> > https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py
>> >
>> > And after loading about 1/5 of the data, all the memory gets filled by
>> > the pgsql backends (loading the data in parallel) and the DB gets killed
>> > by the OOM killer.
>>
>> I've spent a fair amount of time trying to locate the memory leak, but
>> so far no luck. I'm not sufficiently familiar with the GIN code.
>>
>> I can however demonstrate that it's there, and I have rather simple test
>> case to reproduce it - basically just a CREATE INDEX on a table with ~1M
>> email message bodies (in a tsvector column). The data is available here
>> (360MB compressed, 1GB raw):
>>
>> http://www.fuzzy.cz/tmp/message-b.data.gz
>>
>> Simply create a single-column table, load data and create the index
>>
>> CREATE TABLE test ( body_tsvector TSVECTOR );
>> COPY test FROM '/tmp/message-b.data';
>> CREATE test_idx ON test USING gin test ( body_tsvector );
>>
>> I'm running this on a machine with 8GB of RAM, with these settings
>>
>> shared_buffers=1GB
>> maintenance_work_mem=1GB
>>
>> According to top, CREATE INDEX from the current HEAD never consumes more
>> than ~25% of RAM:
>>
>> PID USER PR NI VIRT RES SHR %CPU %MEM COMMAND
>> 32091 tomas 20 0 2026032 1,817g 1,040g 56,2 23,8 postgres
>>
>> which is about right, as (shared_buffers + maintenance_work_mem) is
>> about 1/4 of RAM.
>>
>> With the v5 patch version applied, the CREATE INDEX process eventually
>> goes crazy and allocates almost all the available memory (but somesimes
>> finishes, mostly by pure luck). This is what I was able to get from top
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM COMMAND
>> 14090 tomas 20 0 7913820 6,962g 955036 D 4,3 91,1 postgres
>>
>> while the system was still reasonably responsive.
>>
>
> Thanks a lot for your help! I believe problem is that each decompressed
> item pointers array is palloc'd but not freed. I hope to fix it today.
>
Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
gin-packed-postinglists-varbyte6.patch.gz | application/x-gzip | 30.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Emre Hasegeli | 2014-01-17 11:14:31 | Re: [PATCH] pgcrypto: implement gen_random_uuid |
Previous Message | Marko Tiikkaja | 2014-01-17 10:45:24 | Re: plpgsql.warn_shadow |