From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | Siddharth Jain <siddhsql(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: How to load data from CSV into a table that has array types in its columns? |
Date: | 2022-10-27 00:59:49 |
Message-ID: | 8830ee61-cf09-7ba6-0055-efe7a910ac59@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 10/26/22 17:30, Siddharth Jain wrote:
> Hello,
>
> Given a long list like this:
>
> 1,2
> 2,4 --> 2 appears once
> 7,9
> 8,9
> 5,3
> 2,5 --> note 2 appears twice
>
> I want to load it into this table:
>
> create table tbl (
> id integer primary key,
> fids integer[]
> )
>
> so we will have 2 -> [4,5] where 2 is id and [4,5] are the fids
>
> My actual dataset is very large and has 100M rows in it. How can it be
> efficiently loaded into postgres?
>
> I think I have to use a program for this and am trying to use the pg
> library that comes with Node.js. I am reading the data in batches of 1M
> or 100k rows for example. I have created a dictionary in Node.js where I
> am storing the hashmap. The part where I am stuck is how to generate the
> SQL command?
>
> 1. I want to make one call to the server for the batch of 1M rows, not
> 1M calls
> 2. Some keys in the hashmap might already exist in the database. For
> these keys we want to append to the array
> 3. Some keys will not exist and for these we want to insert new rows
>
> Can someone help me please? I am using Postgres for the first time.
The best way to deal with importing large batches of data is to use COPY
https://www.postgresql.org/docs/current/sql-copy.html
But that will not play well with modifying the data as you input it.
What I can see doing is:
1) COPY the data into a staging table:
create staging_tbl (id integer, fid integer)
2) Then using SQL statements to move the data to the final table.
As example of one possibility, using ON CONFLICT from here:
https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT
insert into table tbl select id, array[fid] from staging_table on
conflict(id) DO UPDATE SET fids = array_append(fids, excluded.fid);
I would test with a smaller example data set to vetify.
>
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Bryn Llewellyn | 2022-10-27 01:33:24 | Seeking the correct term of art for the (unique) role that is usually called "postgres"—and the mental model that underlies it all |
Previous Message | Siddharth Jain | 2022-10-27 00:30:00 | How to load data from CSV into a table that has array types in its columns? |