From: | Steve Midgley <science(at)misuse(dot)org> |
---|---|
To: | Shaozhong SHI <shishaozhong(at)gmail(dot)com> |
Cc: | pgsql-sql <pgsql-sql(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres |
Date: | 2021-10-04 23:34:27 |
Message-ID: | CAJexoSJ8nXoO9mxORayYrhiP+AOwz0Sifj8Y=x02cHLFCV9bVA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-sql |
On Mon, Oct 4, 2021 at 1:19 PM Shaozhong SHI <shishaozhong(at)gmail(dot)com> wrote:
> Hello, Steve,
>
> That is interesting.
>
> Send me the link for Python doing "Copy---From....CSV".
>
> I would like to review and test.
>
> Regards,
>
> David
>
> On Mon, 4 Oct 2021 at 18:30, Steve Midgley <science(at)misuse(dot)org> wrote:
>
>>
>>
>> On Mon, Oct 4, 2021 at 8:55 AM Shaozhong SHI <shishaozhong(at)gmail(dot)com>
>> wrote:
>>
>>> Has anyone tested this one?
>>> A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
>>> Valentiner
>>> <https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>
>>>
>>> I tried psql_insert_copy method, but I got the following error message.
>>>
>>> to_sql() got an unexpected keyword argument 'method'
>>>
>>>
>>> Can anyone shed light on this?
>>>
>>>
>> Interesting - I wasn't aware there was a faster method than "COPY ...
>> FROM ... CSV"
>>
>> Maybe my best input is that I've always found COPY/FROM/CSV to be
>> insanely fast and really easy to write from any standard console or ORM, so
>> maybe the fractional performance increase from other methods isn't worth
>> it, given how performant, reliable and easy to use this approach is? Not
>> really an answer but food for thought..
>>
>>
>> The way I've implemented that (using Ruby, but very similar) is to
actually copy from STDIN rather than CSV. In my implementation I wasn't
running my import code on the Postgres server, so I couldn't get the CSV
file onto the local DB machine. So I had to feed the data over the wire via
STDIN. This is slower than if you can point Postgres to a CSV file that the
Pg server itself can access on the file system, but it's more flexible and
still remarkably fast. It can be even (a bit) faster if you don't feed
STDIN line by line, but in my experience it's a nightmare to debug if you
don't get a line number when an import file barfs.
Anyway, here's the Ruby code that shows the fast import concept:
https://gist.github.com/science/393907d4123c87ed767bc81e9dd5a7da
I wrote this a LONG time ago, but I think the concepts are still relevant.
IIRC, feeding Pg from STDIN was ~400x faster than using "insert into" or
other SQL type commands to get the data in.
I hope this is helpful - I would think this code is easily portable to
Python, as you just need to get a raw connection to Postgres from your ORM
adapter and then model the commands the same way in the code above. Good
luck and write with questions..
Steve
From | Date | Subject | |
---|---|---|---|
Next Message | Ryan Booz | 2021-10-05 00:49:14 | Re: Growth planning |
Previous Message | Dan Davis | 2021-10-04 22:57:25 | Re: How to build psycopg2 for Windows |
From | Date | Subject | |
---|---|---|---|
Next Message | Shaozhong SHI | 2021-10-05 13:05:56 | Is it possible to combine the power of Postgres and Python in do statement? |
Previous Message | Shaozhong SHI | 2021-10-04 20:13:25 | Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres |