From: | Andrei Zhidenkov <andrei(dot)zhidenkov(at)n26(dot)com> |
---|---|
To: | Ertan Küçükoğlu <ertan(dot)kucukoglu(at)1nar(dot)com(dot)tr> |
Cc: | pinker <pinker(at)onet(dot)eu>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Loading 500m json files to database |
Date: | 2020-03-23 11:59:37 |
Message-ID: | 1468923B-049B-4EE6-A2BD-79650CC04149@n26.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Try to write a stored procedure (probably pl/python) that will accept an array of JSON objects so it will be possible to load data in chunks (by 100-1000 files) which should be faster.
> On 23. Mar 2020, at 12:49, Ertan Küçükoğlu <ertan(dot)kucukoglu(at)1nar(dot)com(dot)tr> wrote:
>
>
>> On 23 Mar 2020, at 13:20, pinker <pinker(at)onet(dot)eu> wrote:
>>
>> Hi, do you have maybe idea how to make loading process faster?
>>
>> I have 500 millions of json files (1 json per file) that I need to load to
>> db.
>> My test set is "only" 1 million files.
>>
>> What I came up with now is:
>>
>> time for i in datafiles/*; do
>> psql -c "\copy json_parts(json_data) FROM $i"&
>> done
>>
>> which is the fastest so far. But it's not what i expect. Loading 1m of data
>> takes me ~3h so loading 500 times more is just unacceptable.
>>
>> some facts:
>> * the target db is on cloud so there is no option to do tricks like turning
>> fsync off
>> * version postgres 11
>> * i can spin up huge postgres instance if necessary in terms of cpu/ram
>> * i tried already hash partitioning (to write to 10 different tables instead
>> of 1)
>>
>>
>> Any ideas?
> Hello,
>
> I may not be knowledge enough to answer your question.
>
> However, if possible, you may think of using a local physical computer to do all uploading and after do backup/restore on cloud system.
>
> Compressed backup will be far less internet traffic compared to direct data inserts.
>
> Moreover you can do additional tricks as you mentioned.
>
> Thanks & regards,
> Ertan
>
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Rob Sargent | 2020-03-23 13:31:00 | Re: Loading 500m json files to database |
Previous Message | Ertan Küçükoğlu | 2020-03-23 11:49:25 | Re: Loading 500m json files to database |