Re: Parallel stats in execution plans

From: David Conlin <dc345(at)cantab(dot)net>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Parallel stats in execution plans
Date: 2019-01-24 08:18:03
Message-ID: 62ed1161-859a-6bc6-c263-a6e2b74f670e@cantab.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

It seems like no-one has any ideas on this - does anyone know anywhere
else I can try to look/ask to find out more?

Is it possible that this is a bug?

Thanks

Dave

On 16/01/2019 11:31, David Conlin wrote:
>
> Hi folks -
>
> I'm having trouble understanding what some of the stats mean in the
> execution plan output when parallel workers are used. I've tried to
> read up about it, but I haven't been able to find anything that
> explains what I'm seeing. Apologies in advance if there's
> documentation I've been too stupid to find.
>
> I've run the following query. The "towns" table is a massive table
> that I created in order to get some big numbers on a parallel query -
> don't worry, this isn't a real query I want to make faster, just a
> silly example I'd like to understand.
>
> EXPLAIN (ANALYZE, FORMAT JSON, BUFFERS, VERBOSE)
> SELECT name, code, article
> FROM towns
> ORDER BY nameASC,
> codeDESC;
>
> The output looks like this:
>
> [
> {
> "Plan": {
> "Node Type": "Gather Merge", "Parallel Aware": false, "Startup Cost":
> 1013948.54, "Total Cost": 1986244.55, "Plan Rows": 8333384, "Plan
> Width": 77, "Actual Startup Time": 42978.838, "Actual Total Time":
> 60628.982, "Actual Rows": 10000010, "Actual Loops": 1, "Output": ["name", "code", "article"], "Workers Planned": 2, "Workers Launched": 2, "Shared Hit Blocks":
> 29, "Shared Read Blocks": 47641, "Shared Dirtied Blocks": 0, "Shared
> Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0,
> "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
> Blocks": 91342, "Temp Written Blocks": 91479, "Plans": [
> {
> "Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware":
> false, "Startup Cost": 1012948.52, "Total Cost": 1023365.25, "Plan
> Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 42765.496,
> "Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual
> Loops": 3, "Output": ["name", "code", "article"], "Sort Key": ["towns.name", "towns.code DESC"], "Sort Method": "external merge", "Sort Space Used": 283856, "Sort
> Space Type": "Disk", "Shared Hit Blocks": 170, "Shared Read Blocks":
> 142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local
> Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0,
> "Local Written Blocks": 0, "Temp Read Blocks": 273289, "Temp Written
> Blocks": 273700, "Workers": [
> {
> "Worker Number": 0, "Actual Startup Time": 42588.662, "Actual Total
> Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1, "Shared
> Hit Blocks": 72, "Shared Read Blocks": 46794, "Shared Dirtied Blocks":
> 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read
> Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
> "Temp Read Blocks": 89067, "Temp Written Blocks": 89202 }, {
> "Worker Number": 1, "Actual Startup Time": 42946.705, "Actual Total
> Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1, "Shared
> Hit Blocks": 69, "Shared Read Blocks": 48327, "Shared Dirtied Blocks":
> 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read
> Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
> "Temp Read Blocks": 92880, "Temp Written Blocks": 93019 }
> ], "Plans": [
> {
> "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel
> Aware": true, "Relation Name": "towns", "Schema": "public", "Alias":
> "towns", "Startup Cost": 0.00, "Total Cost": 184524.92, "Plan Rows":
> 4166692, "Plan Width": 77, "Actual Startup Time": 0.322, "Actual Total
> Time": 8305.886, "Actual Rows": 3333337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Shared Hit Blocks": 96, "Shared Read Blocks": 142762, "Shared
> Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0,
> "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written
> Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0, "Workers": [
> {
> "Worker Number": 0, "Actual Startup Time": 0.105, "Actual Total Time":
> 8394.629, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit
> Blocks": 35, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0,
> "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read
> Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
> "Temp Read Blocks": 0, "Temp Written Blocks": 0 }, {
> "Worker Number": 1, "Actual Startup Time": 0.113, "Actual Total Time":
> 8139.382, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit
> Blocks": 32, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0,
> "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read
> Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
> "Temp Read Blocks": 0, "Temp Written Blocks": 0 }
> ]
> }
> ]
> }
> ]
> }, "Planning Time": 22.898, "Triggers": [
> ], "Execution Time": 61133.161 }
> ]
>
> Or a more slimmed-down version, with just the confusing fields:
>
> [
> {
> "Plan": {
> "Node Type": "Gather Merge", "Parallel Aware": false,"Actual Total
> Time": 60628.982, "Actual Rows": 10000010, "Actual Loops": 1,"Workers
> Planned": 2, "Workers Launched": 2,"Plans": [
> {
> "Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware":
> false,"Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual
> Loops": 3,"Workers": [
> {
> "Worker Number": 0,"Actual Total Time": 48456.662, "Actual Rows":
> 3277980, "Actual Loops": 1}, {
> "Worker Number": 1,"Actual Total Time": 48799.414, "Actual Rows":
> 3385130, "Actual Loops": 1}
> ], "Plans": [
> {
> "Node Type": "Seq Scan","Parallel Aware": true,"Actual Total Time":
> 8305.886, "Actual Rows": 3333337, "Actual Loops": 3,"Workers": [
> {
> "Worker Number": 0,"Actual Total Time": 8394.629, "Actual Rows":
> 3277980, "Actual Loops": 1}, {
> "Worker Number": 1,"Actual Total Time": 8139.382, "Actual Rows":
> 3385130, "Actual Loops": 1}
> ]
> }
> ]
> }
> ]
> },"Execution Time": 61133.161 }
> ]
>
> The things I'm struggling to understand are:
>
> * How the time values combine with parallelism. For example, each
> execution of the sort node takes an average of 48.5s, over three
> loops. This makes a total running time of 145.5s. Even if this was
> perfectly distributed between the two workers, I would expect this
> to take 72.75s, which is more than the total execution time, so it
> can't take this long.
> *  How the row numbers combine with those in the "Workers" subkey.
> For example, in the Sort node, worker #0 produces 3,277,980 rows,
> while worker #1 produces 3,385,130 rows. The Sort node as a whole
> produces 3,333,337 rows per loop, for a total of 10,000,010 (the
> value in the gather merge node). I would have expected the number
> of rows produced by the two workers to sum to the number produced
> by the Sort node as a whole, either per loop or in total.
> * How the "Actual Loops" values combine with those in the "Workers"
> subkey. For example, the "Sort" node has 3 loops, but each of the
> workers inside it have 1 loop. I would have expected either:
> o each of the workers to have done 3 loops (since the sort is
> executed 3 times), or
> o the number of loops in the two workers to sum to three (if the
> three executions of the sort are distributed across the two
> workers)
>
> Other info about my setup:
>
> * Postgres version: "PostgreSQL 10.4, compiled by Visual C++ build
> 1800, 64-bit"
> * I installed Postgres using the EnterpriseDB one-click installer
> * OS: Windows 10, v1803
> * I'm using Jetbrains Datagrip to connect to Postgres
> * No errors are logged.
> * Altered config settings:
> *
>
>
> application_name PostgreSQL JDBC Driver session
> client_encoding UTF8 client
> DateStyle ISO, MDY client
> default_text_search_config pg_catalog.english configuration file
> dynamic_shared_memory_type windows configuration file
> extra_float_digits 3 session
> lc_messages English_United States.1252 configuration file
> lc_monetary English_United States.1252 configuration file
> lc_numeric English_United States.1252 configuration file
> lc_time English_United States.1252 configuration file
> listen_addresses * configuration file
> log_destination stderr configuration file
> logging_collector on configuration file
> max_connections 100 configuration file
> max_stack_depth 2MB environment variable
> port 5432 configuration file
> shared_buffers 128MB configuration file
> TimeZone UTC client
>
> Thanks in advance, I'm sure I've done something silly or misunderstood
> something obvious but I can't work out what it is for the life of me.
>
> Dave
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2019-01-24 11:55:42 Re: Parallel stats in execution plans
Previous Message Mariel Cherkassky 2019-01-24 06:32:49 Re: ERROR: found xmin from before relfrozenxid