Re: autocommit (true/false) for more than 1 million records

From: Emi Lu <emilu(at)encs(dot)concordia(dot)ca>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Alex Goncharov <alex(dot)goncharov(dot)usa(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: autocommit (true/false) for more than 1 million records
Date: 2014-08-28 14:49:08
Message-ID: 53FF4164.9070102@encs.concordia.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello All,

I learned a lot by inputs from all of you. To share one more thing about
java_JDBC bypassing autocommit that I tried:
(1) Read/save source data into f1.csv, f2.csv, ......
(2) Copy/load into dest psql.DB
CopyManager cm = null;
FileReader fileReader = null;
cm = new CopyManager((BaseConnection) conn_psql);
fileReader = new FileReader("f1.csv");
cm.copyIn("COPY table_name FROM STDIN WITH DELIMITER '|'",
fileReader);
fileReader.close();

Emi

On 08/27/2014 08:59 AM, Kevin Grittner wrote:
> Alex Goncharov <alex(dot)goncharov(dot)usa(at)gmail(dot)com> wrote:
>> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>>> The rows will all be in the table, but not visible to any other
>>> transaction.
>> How much data can I fit there while doing COPY? Not 1 TB?
> As has already been said, why not? This is not some special
> section of the table -- the data is written to the table. Period.
> Commit or rollback just tells new transactions whether data flagged
> with that transaction number is visible.
>
> Nobody can tell you how much space that will take -- it depends on
> many factors, including how many columns of what kind of data, how
> compressible it is, and how it is indexed. But the point is, we
> are not talking about any separate space from what is needed to
> store the data in the database.
>
> FWIW, I think the largest single COPY statement I ever ran was
> generated by pg_dump and piped directly to psql for a major release
> upgrade (before pg_upgrade was available), and it was somewhere in
> the 2TB to 3TB range. It took a long time, but it "just worked".
> That should be true for 10TB or 100TB, as long as you have sized
> the machine correctly and are loading clean data. Whether you have
> that covered, and how you want to "hedge your bets" based on your
> degree of confidence in those things is a judgment call. When I'm
> in the position of needing to make such a call, I like to do some
> tests.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2014-08-28 20:48:05 Re: Performance issue: index not used on GROUP BY...
Previous Message Victor Yegorov 2014-08-28 12:25:56 Re: Performance issue: index not used on GROUP BY...