Re: SPI Concurrency Precautions? Problems with Parallel Execution of Multiple CREATE TABLE statements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Mercha <mercha_t(at)hotmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SPI Concurrency Precautions? Problems with Parallel Execution of Multiple CREATE TABLE statements
Date: 2020-02-25 10:34:14
Message-ID: CA+TgmoY=91PUrucg+BwJFF=_khUUPPCvPop+67nafrXbfus3uA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 22, 2020 at 5:50 AM Tom Mercha <mercha_t(at)hotmail(dot)com> wrote:
> I am spawning a number of dynamic background workers to execute each
> statement. When I spawn 4 workers on a quad-core machine, the resutling
> execution time per statement is {0.158s, 0.216s, 0.399s, 0.396s}.
> However, when I have just one worker, the results are {0.151s, 0.141s,
> 0.146s, 0.136s}.
>
> The way I am executing my statements is through SPI in each worker
> (using a PG extension) as follows:
> SPI_connect();
> SPI_exec(queryString, 0);
> SPI_finish();
> In both test cases, SPI_connect/finish are executed 4 times.
>
> What I expect is that with 4 workers, each statements will take approx
> 0.15s to execute since they are independent from each other. This would
> result in approx a 4x speedup. Despite seeing concurrency, I am seeing
> that each invdividual statement will take longer to execute. I am
> struggling to understand this behavior, what this suggests to me is that
> there is a lock somewhere which completely defeats my purpose.
>
> I was wondering how I could execute my CREATE TABLE statements in a
> parallel fashion given that they are independent from each other. If the
> lock is the problem, what steps could I take to relax it? I would
> greatly appreciate any guidance or insights on this topic.

Well, I'm not altogether sure that your expectations are realistic.
Rarely do things parallelize perfectly. In a case like this, some time
is probably being spent doing disk I/O. When multiple processes do CPU
work at the same time, you should be able to see near-linear speedup,
but when multiple processes do disk I/O at the same time, you may see
no speedup at all, or even a slowdown, because of the way that disks
work. This seems especially likely given how short the queries are and
the fact that they create a new table, which involves an fsync()
operation.

It's possible that if you run a query to select the wait events from
pg_stat_activity, maybe using psql's \watch with a fractional value,
you might be able to see something about what those queries are
actually spending time on. It's also possible that you might get more
interesting results if you have things that run for longer than a few
hundred milliseconds. But in general I would question the assumption
that this ought to scale well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-02-25 10:44:25 Re: Yet another vectorized engine
Previous Message Vik Fearing 2020-02-25 09:58:54 Re: progress reporting views and TimestampTz fields