Re: Importing a Large .ndjson file

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sankar P <sankar(dot)curiosity(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Importing a Large .ndjson file
Date: 2020-06-17 15:18:21
Message-ID: 1446220.1592407101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sankar P <sankar(dot)curiosity(at)gmail(dot)com> writes:
> I have a .ndjson file. It is a new-line-delimited JSON file. It is
> about 10GB and has about 100,000 records.
> Some sample records:
> { "key11": "value11", "key12": [ "value12.1", "value12.2"], "key13": {
> "k111": "v111" } } \n\r
> { "key21": "value21", "key22": [ "value22.1", "value22.2"] }

> What is the best way to do this on a postgresql database, deployed in
> kubernetes, with a 1 GB RAM allocated ?

It looks like plain old COPY would do this just fine, along the lines
of (in psql)

\copy myTable(content) from 'myfile.ndjson'

If the newlines actually are \n\r rather than the more usual \r\n,
you might have to clean that up to stop COPY from thinking they
represent two line endings not one.

I'd advise extracting the first hundred or so lines of the file and doing
a test import into a temporary table, just to verify the process.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua Drake 2020-06-17 15:19:35 Re: Minor Upgrade Question
Previous Message Jim Hurne 2020-06-17 14:42:50 Re: Sv: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked