From: | Thom Brown <thom(at)linux(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Seq Scan |
Date: | 2015-03-25 11:46:08 |
Message-ID: | CAA-aLv6JMAsDOg7R6DzvcWgLCSukGK_Ap4gRfiC+1NgWaqHAVw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 25 March 2015 at 10:27, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Mar 20, 2015 at 5:36 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> >
> > So the patches have to be applied in below sequence:
> > HEAD Commit-id : 8d1f2390
> > parallel-mode-v8.1.patch [2]
> > assess-parallel-safety-v4.patch [1]
> > parallel-heap-scan.patch [3]
> > parallel_seqscan_v11.patch (Attached with this mail)
> >
> > The reason for not using the latest commit in HEAD is that latest
> > version of assess-parallel-safety patch was not getting applied,
> > so I generated the patch at commit-id where I could apply that
> > patch successfully.
> >
> > [1] -
> http://www.postgresql.org/message-id/CA+TgmobJSuefiPOk6+i9WERUgeAB3ggJv7JxLX+r6S5SYydBRQ@mail.gmail.com
> > [2] -
> http://www.postgresql.org/message-id/CA+TgmoZJjzYnpXChL3gr7NwRUzkAzPMPVKAtDt5sHvC5Cd7RKw@mail.gmail.com
> > [3] -
> http://www.postgresql.org/message-id/CA+TgmoYJETgeAXUsZROnA7BdtWzPtqExPJNTV1GKcaVMgSdhug@mail.gmail.com
> >
>
> Fixed the reported issue on assess-parallel-safety thread and another
> bug caught while testing joins and integrated with latest version of
> parallel-mode patch (parallel-mode-v9 patch).
>
> Apart from that I have moved the Initialization of dsm segement from
> InitNode phase to ExecFunnel() (on first execution) as per suggestion
> from Robert. The main idea is that as it creates large shared memory
> segment, so do the work when it is really required.
>
>
> HEAD Commit-Id: 11226e38
> parallel-mode-v9.patch [2]
> assess-parallel-safety-v4.patch [1]
> parallel-heap-scan.patch [3]
> parallel_seqscan_v12.patch (Attached with this mail)
>
> [1] -
> http://www.postgresql.org/message-id/CA+TgmobJSuefiPOk6+i9WERUgeAB3ggJv7JxLX+r6S5SYydBRQ@mail.gmail.com
> [2] -
> http://www.postgresql.org/message-id/CA+TgmoZfSXZhS6qy4Z0786D7iU_AbhBVPQFwLthpSvGieczqHg@mail.gmail.com
> [3] -
> http://www.postgresql.org/message-id/CA+TgmoYJETgeAXUsZROnA7BdtWzPtqExPJNTV1GKcaVMgSdhug@mail.gmail.com
>
Okay, with my pgbench_accounts partitioned into 300, I ran:
SELECT DISTINCT bid FROM pgbench_accounts;
The query never returns, and I also get this:
grep -r 'starting background worker process "parallel worker for PID
12165"' postgresql-2015-03-25_112522.log | wc -l
2496
2,496 workers? This is with parallel_seqscan_degree set to 8. If I set it
to 2, this number goes down to 626, and with 16, goes up to 4320.
Here's the query plan:
QUERY
PLAN
---------------------------------------------------------------------------------------------------------
HashAggregate (cost=38856527.50..38856529.50 rows=200 width=4)
Group Key: pgbench_accounts.bid
-> Append (cost=0.00..38806370.00 rows=20063001 width=4)
-> Seq Scan on pgbench_accounts (cost=0.00..0.00 rows=1 width=4)
-> Funnel on pgbench_accounts_1 (cost=0.00..192333.33
rows=100000 width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_1
(cost=0.00..1641000.00 rows=100000 width=4)
-> Funnel on pgbench_accounts_2 (cost=0.00..192333.33
rows=100000 width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_2
(cost=0.00..1641000.00 rows=100000 width=4)
-> Funnel on pgbench_accounts_3 (cost=0.00..192333.33
rows=100000 width=4)
Number of Workers: 8
...
-> Partial Seq Scan on pgbench_accounts_498
(cost=0.00..10002.10 rows=210 width=4)
-> Funnel on pgbench_accounts_499 (cost=0.00..1132.34 rows=210
width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_499
(cost=0.00..10002.10 rows=210 width=4)
-> Funnel on pgbench_accounts_500 (cost=0.00..1132.34 rows=210
width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_500
(cost=0.00..10002.10 rows=210 width=4)
Still not sure why 8 workers are needed for each partial scan. I would
expect 8 workers to be used for 8 separate scans. Perhaps this is just my
misunderstanding of how this feature works.
--
Thom
From | Date | Subject | |
---|---|---|---|
Next Message | Sawada Masahiko | 2015-03-25 11:46:41 | Re: Auditing extension for PostgreSQL (Take 2) |
Previous Message | Shigeru HANADA | 2015-03-25 11:41:13 | Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) |