Re: autocommit (true/false) for more than 1 million records

From: Felipe Santos <felipepts(at)gmail(dot)com>
To: emilu(at)encs(dot)concordia(dot)ca
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>, david(dot)g(dot)johnston(at)gmail(dot)com
Subject: Re: autocommit (true/false) for more than 1 million records
Date: 2014-08-25 14:02:52
Message-ID: CAPYcRiURd0Zzg-Oo9R0126xideKz42A+tNNem+a0LW0rwzc=-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Emi,

Databases that comply to the ACID standard (
http://en.wikipedia.org/wiki/ACID) ensure that that are no data loss by
first writing the data changes to the database log in opposition to
updating the actual data on the filesystem first (on the datafiles).

Each database has its own way of doing it, but it basically consists of
writing the data to the logfile at each COMMIT and writing the data to the
datafile only when it's necessary.

So the COMMIT command is a way of telling the database to write the data
changes to the logfile.

Both logfiles and datafiles resides on the filesystem, but why writing to
the logfile is faster?

It is because the logfile is written sequentially, while the datafile is
totally dispersed and may even be fragmented.

Resuming: autocommit false is faster because you avoid going to the hard
disk to write the changes into the logfile, you keep them in RAM memory
until you decide to write them to the logfile (at each 10K rows for
instance).

Be aware that, eventually, you will need to write data to the logfile, so
you can't avoid that. But usually the performance is better if you write X
rows at a time to the logfile, rather than writing every and each row one
by one (because of the hard disk writing overhead).

The number of rows you need to write to get a better performance will
depend on your environment and is pretty much done by blind-testing the
process. For millions of rows, I usually commit at each 10K or 50K rows.

Regards,

Felipe

2014-08-25 10:40 GMT-03:00 Emi Lu <emilu(at)encs(dot)concordia(dot)ca>:

> Good morning,
>
>> Trying to insert into one table with 1 million records through java
>>>>> JDBC into psql8.3. May I know (1) or (2) is better please?
>>>>>
>>>>> (1) set autocommit(true)
>>>>> (2) set autocommit(false)
>>>>> commit every n records (e.g., 100, 500, 1000, etc)
>>>>>
>>>> It depends on what you need.
>>>>
>>>> Data will be available to concurrent processes earlier with (1), while
>>>> (2) will go faster.
>>>>
>>> No need to worry about the lock/loosing records because after data
>>> loading will do a check. For now, I'd like the fastest way. Would
>>> you suggest commit every 1000 or 3000 records?
>>>
>> The improvement drops off pretty quickly in my experience, but it
>> depends on the size of the records and other things.
>>
> The table is huge with almost 170 columns.
>
> Try it and see..? It's almost certainly going to depend on your
>> specific environment.
>>
> Can you let me know what are the "specific environment" please? Such as:
> ......
>
> By the way, could someone let me know why set autocommit(false) is for
> sure faster than true please? Or, some online docs talk about this.
>
> Thanks a lot!
> Emi
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2014-08-25 15:48:18 Re: autocommit (true/false) for more than 1 million records
Previous Message David Johnston 2014-08-25 13:51:25 Re: autocommit (true/false) for more than 1 million records