Re: Using COPY to import large xml file

From: Christoph Moench-Tegeder <cmt(at)burggraben(dot)net>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Using COPY to import large xml file
Date: 2018-06-24 17:45:44
Message-ID: 20180624174543.qj6yvp5x7koaql6k@squirrel.exwg.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

## Adrien Nayrat (adrien(dot)nayrat(at)anayrat(dot)info):

> I used this tool :
> https://github.com/Networks-Learning/stackexchange-dump-to-postgres

That will be awfully slow: this tool commits each INSERT on it's own,
see loop in
https://github.com/Networks-Learning/stackexchange-dump-to-postgres/blob/master/load_into_pg.py#L83

With only small changes - prepare the INSERT, execute for all (or at
least a huge lot of) rows, COMMIT at the end - you can safe quite a lot
of time (500 rows is not "a huge lot"). And when you do that, for
heaven's sake, do not try to create the INSERT statement as a string
with the values - Bobby Tables will eat you. See psycopg documentation
on how it's done (especially watch the warnings):
http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
On prepared statements with psycopg2, see
http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/
(python makes stuff rather hard, but I'll spare you the snark and wait until
the language has matured to at least version 5).

Using the COPY protocol with psycopg2 seems to require some hoop-jumping,
but could improve matters even more.

Regards,
Christoph

--
Spare Space.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tim Cross 2018-06-24 22:14:28 Re: Using COPY to import large xml file
Previous Message Adrian Klaver 2018-06-24 16:57:58 Re: Using COPY to import large xml file