Re: Parallel bitmap index scan

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap index scan
Date: 2020-08-17 14:18:10
Message-ID: CAFiTN-uB7md0C=cV+Fd8aN=uBcNedtigEgOFWno7EnGefcYXBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 17 Aug 2020 at 7:42 PM, Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Sun, Jul 26, 2020 at 6:43 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> >
>
> > I would like to propose a patch for enabling the parallelism for the
>
> > bitmap index scan path.
>
> >
>
> > Background:
>
> > Currently, we support only a parallel bitmap heap scan path. Therein,
>
> > the underlying bitmap index scan is done by a single worker called the
>
> > leader. The leader creates a bitmap in shared memory and once the
>
> > bitmap is ready it creates a shared iterator and after that, all the
>
> > workers process the shared iterator and scan the heap in parallel.
>
> > While analyzing the TPCH plan we have observed that some of the
>
> > queries are spending significant time in preparing the bitmap. So the
>
> > idea of this patch is to use the parallel index scan for preparing the
>
> > underlying bitmap in parallel.
>
> >
>
> > Design:
>
> > If underlying index AM supports the parallel path (currently only
>
> > BTREE support it), then we will create a parallel bitmap heap scan
>
> > path on top of the parallel bitmap index scan path. So the idea of
>
> > this patch is that each worker will do the parallel index scan and
>
> > generate their part of the bitmap. And, we will create a barrier so
>
> > that we can not start preparing the shared iterator until all the
>
> > worker is ready with their bitmap. The first worker, which is ready
>
> > with the bitmap will keep a copy of its TBM and the page table in the
>
> > shared memory. And, all the subsequent workers will merge their TBM
>
> > with the shared TBM. Once all the TBM are merged we will get one
>
> > common shared TBM and after that stage, the worker can continue. The
>
> > remaining part is the same, basically, again one worker will scan the
>
> > shared TBM and prepare the shared iterator and once it is ready all
>
> > the workers will jointly scan the heap in parallel using shared
>
> > iterator.
>
> >
>
>
>
> Though I have not looked at the patch or code for the existing
>
> parallel bitmap heap scan, one point keeps bugging in my mind. I may
>
> be utterly wrong or my question may be so silly, anyways I would like
>
> to ask here:
>
>
>
> From the above design: each parallel worker creates partial bitmaps
>
> for the index data that they looked at. Why should they merge these
>
> bitmaps to a single bitmap in shared memory? Why can't each parallel
>
> worker do a bitmap heap scan using the partial bitmaps they built
>
> during it's bitmap index scan and emit qualified tuples/rows so that
>
> the gather node can collect them? There may not be even lock
>
> contention as bitmap heap scan takes read locks for the heap
>
> pages/tuples.

The main reason is that there could be lossy pages in bitmap and if that is
the case then there will be duplicate data. Maybe if there is no lossy
data in any of the bitmap we might do as u describe but still I think that
it is very much possible that different bitmap might have many common heap
pages because bitmap is prepared using index scan. And in such cases we
will be doing duplicate i/o.

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-08-17 14:20:46 Re: Newline after --progress report
Previous Message Andy Fan 2020-08-17 14:12:39 Re: Improve planner cost estimations for alternative subplans