RE: Parallel INSERT SELECT take 2

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Parallel INSERT SELECT take 2
Date: 2021-04-27 02:09:01
Message-ID: OS0PR01MB5716A603121216A70266A86594419@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Currently, index expression and predicate are stored in text format.
> > We need to use stringToNode(expression/predicate) to parse it.
> > Some committers think doing this twice does not look good, unless we
> > found some ways to pass parsed info to the executor to avoid the second
> parse.
>
> How much is the extra cost that's added if we do stringToNode twiice?
> Maybe, we can check for a few sample test cases by just having
> stringToNode(expression/predicate) in the planner and see if it adds much to
> the total execution time of the query.

OK, I will do some test on it.

> > Yes, because both parent table and child table will be inserted and
> > the parallel related objects on them will be executed. If users want
> > to make sure the parallel insert succeed, they need to check all the objects.
>
> Then, running the pg_get_parallel_safety will have some overhead if there are
> many partitions associated with a table. And, this is the overhead planner
> would have had to incur without the declarative approach which we are trying
> to avoid with this design.

Yes, I think put such overhead in a separate function is better than in a common path(planner).

> > Foreign table itself is considered as parallel restricted, because we
> > do not support parallel insert fdw api for now.
>
> Maybe, the ALTER TABLE ... SET PARALLEL on a foreign table should default to
> parallel restricted always and emit a warning saying the reason?

Thanks for the comment, I agree.
I will change this in the next version patches.

> > But the ALTER PARALLEL command itself does not check all the partition's
> safety flag.
> > The function pg_get_parallel_safety can return all the partition's not
> > safe objects, user should set the parallel safety based the function's result.
>
> I'm thinking that when users say ALTER TABLE partioned_table SET PARALLEL
> TO 'safe';, we check all the partitions' and their associated objects' parallel
> safety? If all are parallel safe, then only we set partitioned_table as parallel safe.
> What should happen if the parallel safety of any of the associated
> objects/partitions changes after setting the partitioned_table safety?

Currently, nothing happened if any of the associated objects/partitions changes after setting the partitioned_table safety.
Because , we do not have a really cheap way to catch the change. The existing relcache does not work because alter function
does not invalid the relcache which the function belongs to. And it will bring some other overhead(locking, systable scan,...)
to find the table the objects belong to.

> My understanding was that: the command ALTER TABLE ... SET PARALLEL TO
> 'safe' work will check the parallel safety of all the objects associated with the
> table. If the objects are all parallel safe, then the table will be set to safe. If at
> least one object is parallel unsafe or restricted, then the command will fail.

I think this idea makes sense. Some detail of the designed can be improved.
I agree with you that we can try to check check all the partitions' and their associated objects' parallel safety when ALTER PARALLEL.
Because it's a separate new command, add some overhead to it seems not too bad.
If there are no other objections, I plan to add safety check in the ALTER PARALLEL command.

> also thinking that how will the design cope with situations such as the parallel
> safety of any of the associated objects changing after setting the table to
> parallel safe. The planner would have relied on the outdated parallel safety of
> the table and chosen parallel inserts and the executor will catch such situations.
> Looks like my understanding was wrong.

Currently, we assume user is responsible for its correctness.
Because, from our research, when the parallel safety of some of these objects is changed,
it's costly to reflect it on the parallel safety of tables that depend on them.
(we need to scan the pg_depend,pg_inherit,pg_index.... to find the target table)

> So, the ALTER TABLE ... SET PARALLEL TO command just sets the target table
> safety, doesn't bother what the associated objects' safety is.
> It just believes the user. If at all there are any parallel unsafe objects it will be
> caught by the executor. Just like, setting parallel safety of the
> functions/aggregates, the docs caution users about accidentally/intentionally
> tagging parallel unsafe functions/aggregates as parallel safe.

Yes, thanks for looking into this.

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-04-27 02:15:36 Re: Parallel INSERT SELECT take 2
Previous Message Michael Paquier 2021-04-27 01:54:02 Re: Addition of authenticated ID to pg_stat_activity