From: | Tom Mercha <mercha_t(at)hotmail(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SPI Concurrency Precautions? Problems with Parallel Execution of Multiple CREATE TABLE statements |
Date: | 2020-02-22 00:20:06 |
Message-ID: | AM6PR05MB5080A6871E34CA17B196F922F4EE0@AM6PR05MB5080.eurprd05.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17/02/2020 21:24, Tom Mercha wrote:
> Dear Hackers
>
> I've been working on an extension and using SPI to execute some queries.
> I am in a situation where I have the option to issue multiple queries
> concurrently, ideally under same snapshot and transaction. In short, I
> am achieving this by creating multiple dynamic background workers, each
> one of them executing a query at the same time using
> SPI_execute(sql_string, ...). To be more precise, sometimes I am also
> opting to issue a 'CREATE TABLE AS <sql_query>' command, an SPI utility
> command.
>
> I was however wondering whether I can indeed achieve concurrency in this
> way. My initial results are not showing much difference compared to a
> not concurrent implementation. If there would be a large lock somewhere
> in SPI implementation obviously this can be counterintuitive. What would
> be the precautions I would need to consider when working with SPI in
> this manner?
>
> Thanks,
> Tom
Dear Hackers
I have run some tests to try and better highlight my issue as I am still
struggling a lot with it.
I have 4 'CREATE TABLE AS' statements of this nature: "CREATE TABLE
<different_tbl_name> AS <same_query>". This means that I have different
table names for the same query.
I am spawning a number of dynamic background workers to execute each
statement. When I spawn 4 workers on a quad-core machine, the resutling
execution time per statement is {0.158s, 0.216s, 0.399s, 0.396s}.
However, when I have just one worker, the results are {0.151s, 0.141s,
0.146s, 0.136s}.
The way I am executing my statements is through SPI in each worker
(using a PG extension) as follows:
SPI_connect();
SPI_exec(queryString, 0);
SPI_finish();
In both test cases, SPI_connect/finish are executed 4 times.
What I expect is that with 4 workers, each statements will take approx
0.15s to execute since they are independent from each other. This would
result in approx a 4x speedup. Despite seeing concurrency, I am seeing
that each invdividual statement will take longer to execute. I am
struggling to understand this behavior, what this suggests to me is that
there is a lock somewhere which completely defeats my purpose.
I was wondering how I could execute my CREATE TABLE statements in a
parallel fashion given that they are independent from each other. If the
lock is the problem, what steps could I take to relax it? I would
greatly appreciate any guidance or insights on this topic.
Thanks,
Tom
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2020-02-22 00:28:02 | Re: Parallel copy |
Previous Message | Magnus Hagander | 2020-02-21 21:40:04 | Re: Do we ever edit release notes after release to add warnings about incompatibilities? |