Re: SPI Concurrency Precautions? Problems with Parallel Execution of Multiple CREATE TABLE statements

From: Tom Mercha <mercha_t(at)hotmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SPI Concurrency Precautions? Problems with Parallel Execution of Multiple CREATE TABLE statements
Date: 2020-02-22 00:20:06
Message-ID: AM6PR05MB5080A6871E34CA17B196F922F4EE0@AM6PR05MB5080.eurprd05.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17/02/2020 21:24, Tom Mercha wrote:
> Dear Hackers
>
> I've been working on an extension and using SPI to execute some queries.
> I am in a situation where I have the option to issue multiple queries
> concurrently, ideally under same snapshot and transaction. In short, I
> am achieving this by creating multiple dynamic background workers, each
> one of them executing a query at the same time using
> SPI_execute(sql_string, ...). To be more precise, sometimes I am also
> opting to issue a 'CREATE TABLE AS <sql_query>' command, an SPI utility
> command.
>
> I was however wondering whether I can indeed achieve concurrency in this
> way. My initial results are not showing much difference compared to a
> not concurrent implementation. If there would be a large lock somewhere
> in SPI implementation obviously this can be counterintuitive. What would
> be the precautions I would need to consider when working with SPI in
> this manner?
>
> Thanks,
> Tom

Dear Hackers

I have run some tests to try and better highlight my issue as I am still
struggling a lot with it.

I have 4 'CREATE TABLE AS' statements of this nature: "CREATE TABLE
<different_tbl_name> AS <same_query>". This means that I have different
table names for the same query.

I am spawning a number of dynamic background workers to execute each
statement. When I spawn 4 workers on a quad-core machine, the resutling
execution time per statement is {0.158s, 0.216s, 0.399s, 0.396s}.
However, when I have just one worker, the results are {0.151s, 0.141s,
0.146s, 0.136s}.

The way I am executing my statements is through SPI in each worker
(using a PG extension) as follows:
SPI_connect();
SPI_exec(queryString, 0);
SPI_finish();
In both test cases, SPI_connect/finish are executed 4 times.

What I expect is that with 4 workers, each statements will take approx
0.15s to execute since they are independent from each other. This would
result in approx a 4x speedup. Despite seeing concurrency, I am seeing
that each invdividual statement will take longer to execute. I am
struggling to understand this behavior, what this suggests to me is that
there is a lock somewhere which completely defeats my purpose.

I was wondering how I could execute my CREATE TABLE statements in a
parallel fashion given that they are independent from each other. If the
lock is the problem, what steps could I take to relax it? I would
greatly appreciate any guidance or insights on this topic.

Thanks,
Tom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-02-22 00:28:02 Re: Parallel copy
Previous Message Magnus Hagander 2020-02-21 21:40:04 Re: Do we ever edit release notes after release to add warnings about incompatibilities?