Quick Links

Re: Using COPY to import large xml file

From:	Christoph Moench-Tegeder <cmt(at)burggraben(dot)net>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Using COPY to import large xml file
Date:	2018-06-24 17:45:44
Message-ID:	20180624174543.qj6yvp5x7koaql6k@squirrel.exwg.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

## Adrien Nayrat (adrien(dot)nayrat(at)anayrat(dot)info):

> I used this tool :
> https://github.com/Networks-Learning/stackexchange-dump-to-postgres

That will be awfully slow: this tool commits each INSERT on it's own,
see loop in
https://github.com/Networks-Learning/stackexchange-dump-to-postgres/blob/master/load_into_pg.py#L83

With only small changes - prepare the INSERT, execute for all (or at
least a huge lot of) rows, COMMIT at the end - you can safe quite a lot
of time (500 rows is not "a huge lot"). And when you do that, for
heaven's sake, do not try to create the INSERT statement as a string
with the values - Bobby Tables will eat you. See psycopg documentation
on how it's done (especially watch the warnings):
http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
On prepared statements with psycopg2, see
http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/
(python makes stuff rather hard, but I'll spare you the snark and wait until
the language has matured to at least version 5).

Using the COPY protocol with psycopg2 seems to require some hoop-jumping,
but could improve matters even more.

Regards,
Christoph

--
Spare Space.

In response to

Re: Using COPY to import large xml file at 2018-06-24 15:53:00 from Adrien Nayrat

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tim Cross	2018-06-24 22:14:28	Re: Using COPY to import large xml file
Previous Message	Adrian Klaver	2018-06-24 16:57:58	Re: Using COPY to import large xml file