From: | Christoph Moench-Tegeder <cmt(at)burggraben(dot)net> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Using COPY to import large xml file |
Date: | 2018-06-24 17:45:44 |
Message-ID: | 20180624174543.qj6yvp5x7koaql6k@squirrel.exwg.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
## Adrien Nayrat (adrien(dot)nayrat(at)anayrat(dot)info):
> I used this tool :
> https://github.com/Networks-Learning/stackexchange-dump-to-postgres
That will be awfully slow: this tool commits each INSERT on it's own,
see loop in
https://github.com/Networks-Learning/stackexchange-dump-to-postgres/blob/master/load_into_pg.py#L83
With only small changes - prepare the INSERT, execute for all (or at
least a huge lot of) rows, COMMIT at the end - you can safe quite a lot
of time (500 rows is not "a huge lot"). And when you do that, for
heaven's sake, do not try to create the INSERT statement as a string
with the values - Bobby Tables will eat you. See psycopg documentation
on how it's done (especially watch the warnings):
http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
On prepared statements with psycopg2, see
http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/
(python makes stuff rather hard, but I'll spare you the snark and wait until
the language has matured to at least version 5).
Using the COPY protocol with psycopg2 seems to require some hoop-jumping,
but could improve matters even more.
Regards,
Christoph
--
Spare Space.
From | Date | Subject | |
---|---|---|---|
Next Message | Tim Cross | 2018-06-24 22:14:28 | Re: Using COPY to import large xml file |
Previous Message | Adrian Klaver | 2018-06-24 16:57:58 | Re: Using COPY to import large xml file |