From: | pascal(at)ensieve(dot)org |
---|---|
To: | psycopg(at)postgresql(dot)org |
Subject: | Fastest way to insert/update many rows |
Date: | 2014-08-12 13:46:46 |
Message-ID: | A54AEDB1-20E3-4B18-A91B-F9F3C2A603B7@ensieve.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | psycopg |
Hi,
I'd like to psycopg2 to fetch a large number of rows (hundreds of millions), perform some computations and put them back into the database.
I can fetch about 130k rows/sec with
cur.execute('select * from stuff')
keyvals = list(cur)
and 100k/sec with
f = io.StringIO()
cur.copy_to(f, 'stuff')
f.seek(0)
keyvals = list(tuple(map(int, l.split('\t'))) for l in f)
but inserting using
cur.executemany('insert into stuff values (%s, %s)', keyvals)
only has a throughput of 23k/sec with ca. 20% CPU used by Python, 80% by Postgres, while
cur.copy_from(io.StringIO('\n'.join('{}\t{}'.format(*r) for r in keyvals)), 'stuff')
manages to insert 1.8M/sec.
I can't quite believe that generating a string should be the fastest method, am I missing something?
What I'd really like to do is
cur.executemany('update stuff set value = %s where key = %s', ...)
but that was orders of magnitude slower still; probably because the order is random, so it performs an index lookup for each key.
Populating a temporary table and using 'update stuff ... from temptable ...' is quicker.
I have to set one column in each row, is there a way to update cursors like in PL/pgSQL's
update <table> set ... where current of <cursor>
i.e. iterate through the rows in the most efficient way for the database.
Or would it be wiser to use PL/Python for this kind of task instead?
--
Pascal Germroth
From | Date | Subject | |
---|---|---|---|
Next Message | Daniele Varrazzo | 2014-08-12 19:17:10 | Re: Fastest way to insert/update many rows |
Previous Message | Federico Di Gregorio | 2014-07-25 16:40:20 | Re: JSONB marshalling |