RE: Loading 500m json files to database

From: Kevin Brannen <KBrannen(at)efji(dot)com>
To: pinker <pinker(at)onet(dot)eu>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: RE: Loading 500m json files to database
Date: 2020-03-24 17:29:23
Message-ID: SA0PR19MB42555877CA8D229BA3F405ACA4F10@SA0PR19MB4255.namprd19.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

From: pinker <pinker(at)onet(dot)eu>

> it's a cloud and no plpythonu extension avaiable unfortunately

You're misunderstanding him. See David's post for an example, but the point was that you can control all of this from an *external* Perl, Python, Bash, whatever program on the command line at the shell.

In pseudo-code, probably fed by a "find" command piping filenames to it:

while more files
do { read in a file name & add to list } while (list.length < 1000);
process entire list with \copy commands to 1 psql command

I've left all kinds of checks out of that, but that's the basic thing that you need, implement in whatever scripting language you're comfortable with.

HTH,
Kevin
This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargent 2020-03-24 17:32:55 Re: Loading 500m json files to database
Previous Message Jerry Sievers 2020-03-24 16:49:24 Re: avoid WAL for refresh of materialized view