connecting multiple INSERT CTEs to same record?

From: Assaf Gordon <assafgordon(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: connecting multiple INSERT CTEs to same record?
Date: 2021-10-19 07:19:53
Message-ID: 68708e28-dd17-13d6-2acd-9ca4f9735379@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I'm looking for a way to insert items to multiple tables (connected with
foreign keys) using CTEs.

I found few similar questions on stack-overflow and the mailing list,
but no solution.

Please consider this contrived example:
A table of students, and a table of classes they are in:
====
create temp table students(
id serial primary key,
name varchar);

create temp table classes(
id serial primary key,
student_id int not null references students(id),
subject varchar);
====

And,
I am given a list of new students and their classes (from an external
source, imported into a temp table):

====
create temp table new_data (
name varchar,
subject varchar);

insert into new_data values
('John','Math'),
('Jane','Physics'),
('Moe','Science'),
('John','English'); -- different student with same name
====

I want to first create new 'students' record, and then create a
corresponding 'classes' record. For that - I need both the newly
assigned 'id', and they corresponding source record from 'new_data'.

A trivial usage of "insert ... returning *" doesn't work - it can't
return columns that were not directly inserted (and the 'subject' string
wasn't inserted):

===
with
new_students as (
insert into students(name)
select name from new_data
returning *
)

insert into classes(student_id, subject)
select new_students.id,
-- how to get the subject matching to
--- the newly inserted student?
??????
from new_students ;
===

I found a work-around, but it relies on a flimsy assumption - that
within this transaction, the CTIDs for the 'new_data' and the newly
inserted 'students' rows maintain the same order.
Using this assumption, I calculate a unique value for each row with
'row_number() over (order by ctid)',
assuming that the order stays the same,
then join the tables to match the new student to its 'subject':

===
with
new_data_with_order as (
select
row_number() over (order by ctid) as new_order,
*
from new_data )
,
new_students as (
insert into students(name)
select name from new_data_with_order
returning ctid,*
)
,
new_students_with_order as (
select
row_number() over (order by ctid) as new_order,
*
from new_students
)
,

merged_data as (
-- Is this correct??
-- it is based on the assumption that the order
-- of inserted rows into 'students' (and hence, their CTID order)
-- is the same as the order of the rows in 'new_data'
-- and their CTID.
select *
from new_students_with_order
join new_data_with_order
on
new_students_with_order.new_order =
new_data_with_order.new_order
)

-- for troubleshooting:
-- select * from merged_data ;

insert into classes(student_id, subject)
select
id, -- merged_data.id is the newly assigned student.id primary key.
subject
from merged_data ;

===

So my question is:
Is this assumption about CTID maintaining order (within the transaction)
correct? Even when considering concurrency?
Even if in strange coincidence 'VACCUUM' is running in parallel to this
query?

Or,
Is there another way to achieve this? consider that this is a contrived
example, in my use-case I need to update several tables.
Of course it can be done on the application-level, one-by-one inserting
new 'student' then inserting its 'classes' - but I prefer to avoid that.

Sadly, I can't assume the student name is unique, so I can't "join" on it.

Thanks!
- Assaf Gordon

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pawan Sharma 2021-10-19 09:28:33 spannerdb migration to PostgreSQL
Previous Message David G. Johnston 2021-10-19 05:25:25 Where is the tsrange() function documented?