Re: using custom scan nodes to prototype parallel sequential scan

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: using custom scan nodes to prototype parallel sequential scan
Date: 2014-11-14 01:12:45
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8010776AF@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 12 November 2014 07:54, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> > On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs
> <simon(at)2ndquadrant(dot)com> wrote:
> >>
> >>
> >> This plan type is widely used in reporting queries, so will hit
> the
> >> mainline of BI applications and many Mat View creations.
> >> This will allow SELECT count(*) FROM foo to go faster also.
> >>
> >
> > We'd also need to add some infrastructure to merge aggregate
> states together
> > for this to work properly. This means that could also work for
> avg() and
> > stddev etc. For max() and min() the merge functions would likely
> just be the
> > same as the transition functions.
>
>
> Do you mean something like a "subtotal" or "intermediate
> combination" functon?
>
>
>
>
> If you had 4 parallel workers performing a seqscan, say the relation was
> 4000 pages in total, you could say that worker 1 would scan blocks 0-999,
> worker 2, 1000-1999 etc. After each worker had finished, there would then
> be 4 sets of records then needed to be merged into 1 set.
>
> Take int4_avg_accum() for example it does:
>
>
> transdata->count++;
> transdata->sum += newval;
>
> The merge function would need to perform something like:
>
> transdata->count += transdata2merge.count; sum += transdata2merge.sum;
>
> Then the final function could be called on the merged aggregate state.
>
> The same can be applied when the query contains a GROUP BY clause, just
> we'd need pay attention to which groups we merge together for that to work
> Any HAVING clause would have to be applied after the groups have been merged.
>
> This whole topic is pretty exciting for data warehouse type workloads.
>
More simplify, we can describe parallel aware aggregate function.
Please assume AVG(X) function that takes nrows and sum of X. Its transition
function performs as like you described above, then final function works as
usual.

The job of parallel seq scan needs to do is:
1. replace AVG(X) by AVG(nrows, sum(X)
2. generate count(*) on the partial relation being grouped.
3. generate sum(X) on the partial relation being grouped.

It looks like the following query:
SELECT AVG(nrows, sum_X) FROM (
SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno between 0 and 999 GROUP BY cat
UNION
SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno between 1000 and 1999 GROUP BY cat
UNION
SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno between 2000 and 2999 GROUP BY cat
UNION
SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno between 3000 and 3999 GROUP BY cat
);

Note that stuff inside sub-query is a job of parallel scan.

Do we need to invent a new infrastructure here?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-11-14 01:24:36 Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS
Previous Message David Johnston 2014-11-14 01:08:35 Re: Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4