From: | "Constantin S(dot) Pan" <kvapen(at)gmail(dot)com> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [WIP] speeding up GIN build with parallel workers |
Date: | 2016-03-16 00:11:15 |
Message-ID: | 20160316031115.5856920c@monster |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 14 Mar 2016 08:42:26 -0400
David Steele <david(at)pgmasters(dot)net> wrote:
> On 2/18/16 10:10 AM, Constantin S. Pan wrote:
> > On Wed, 17 Feb 2016 23:01:47 +0300
> > Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
> >
> >> My feedback is (Mac OS X 10.11.3)
> >>
> >> set gin_parallel_workers=2;
> >> create index message_body_idx on messages using gin(body_tsvector);
> >> LOG: worker process: parallel worker for PID 5689 (PID 6906) was
> >> terminated by signal 11: Segmentation fault
> >
> > Fixed this, try the new patch. The bug was in incorrect handling
> > of some GIN categories.
>
> Oleg, it looks like Constantin has updated to patch to address the
> issue you were seeing. Do you have time to retest and review?
>
> Thanks,
Actually, there was some progress since. The patch is
attached.
1. Added another GUC parameter for changing the amount of
shared memory for parallel GIN workers.
2. Changed the way results are merged. It uses shared memory
message queue now.
3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.
wnum mem(MB) time(s)
0 16 247
1 16 256
2 16 126
4 16 89
0 32 247
1 32 270
2 32 123
4 32 92
0 64 254
1 64 272
2 64 123
4 64 88
0 128 250
1 128 263
2 128 126
4 128 85
0 256 247
1 256 269
2 256 130
4 256 88
0 512 257
1 512 275
2 512 129
4 512 92
0 1024 255
1 1024 273
2 1024 130
4 1024 90
On Wed, 17 Feb 2016 12:26:05 -0800
Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Wed, Feb 17, 2016 at 7:55 AM, Constantin S. Pan <kvapen(at)gmail(dot)com>
> wrote:
> > 4. Hit the 8x speedup limit. Made some analysis of the reasons (see
> > the attached plot or the data file).
>
> Did you actually compare this to the master branch? I wouldn't like to
> assume that the one worker case was equivalent. Obviously that's the
> really interesting baseline.
Compared with the master branch. The case of 0 workers is
indeed equivalent to the master branch.
Regards,
Constantin
Attachment | Content-Type | Size |
---|---|---|
pgin-5.patch | text/x-patch | 20.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-03-16 00:17:07 | Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types |
Previous Message | Vik Fearing | 2016-03-16 00:08:21 | Re: Idle In Transaction Session Timeout, revived |