Re: Insertion of large xml files into PostgreSQL 10beta1

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Alain Toussaint <atoussaint1976(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Insertion of large xml files into PostgreSQL 10beta1
Date: 2017-06-23 15:51:16
Message-ID: CAKFQuwZ5UAtSjq=hR6rXfq6V++xOhomADEvOf+7i45D8DD_1sA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jun 23, 2017 at 8:19 AM, Alain Toussaint <atoussaint1976(at)gmail(dot)com>
wrote:

> Hello,
>
> I am building up a PostgreSQL server which I intend to load the
> entirety of the pubmed database data (23GB bzip2 compressed, 216GB
> unpacked) which is available in xml form of which, here is an example:
>
> https://www.ncbi.nlm.nih.gov/pubmed/21833294?report=xml&format=text
>
> I looked at the documentation as well as several examples code for
> loading the data and the one example who nearly succeeded is this
> procedure:
>
> /usr/bin/psql medline
>
> \set :largexmlfile: 'cat /srv/pgsql/pubmed/medline17n0001.xml'
> INSERT INTO samples (xmldata) VALUES :largexmlfile:
>

​I'll assume you've just mis-keyed this from memory since the syntax of the
above doesn't like right.

>
> (from reading the list post here:
> https://www.postgresql.org/message-id/20160624083757.GA5459%40msg.df7cb.de
> )
>
> In which, about 334MB of data from medline17n0001.xml will flood the
> monitor.

​If the above general command sequence is done right, and echoing of
commands is turned off, you should not see any of the XML file content
echoed to the output.​

>
> it is possible to turn off validation of the content between the xml
> tags of the files.
>
>
​You can either turn off validation for the entire file or leave it on -
PostgreSQL isn't recognizing tags here (you haven't defined the samples
table for us...).​

​Narrowing down the entire file to a small problem region and posting a
self-contained example, or at least providing the error messages and
content, might help elicit good responses.​ Even if you could load the
data without incident using it make end up proving problematic. That said
character encodings and sets are not my strong suit.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Igal @ Lucee.org 2017-06-23 18:04:39 Download 9.6.3 Binaries
Previous Message Alain Toussaint 2017-06-23 15:19:49 Insertion of large xml files into PostgreSQL 10beta1