Re: Blocking every 20 sec while mass copying.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Benjamin Dugast <bdugast(at)excilys(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Blocking every 20 sec while mass copying.
Date: 2014-07-18 17:56:35
Message-ID: CAMkU=1yzvrh=3R0rP+uRdAXywxh-=BVvT0ZaDWBJ43GLeEnSVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Jul 18, 2014 at 3:52 AM, Benjamin Dugast <bdugast(at)excilys(dot)com>
wrote:

> Hello,
>
> I'm working on Postgres 9.3.4 for a project.
>
> We are using Scala, Akka and JDBC to insert data in the database, we have
> around 25M insert to do which are basically lines from 5000 files. We issue
> a DELETE according to the file (mandatory) and then a COPY each 1000 lines
> of that file.
>
> *DELETE request :* DELETE FROM table WHERE field1 = ? AND field2 = ?;
> *COPY request :* COPY table FROM STDIN WITH CSV
>
> We have indexes on our database that we can't delete to insert our data.
>

Inserting data into large indexed tables will usually dirty a prodigious
amount of data in a random manner, to maintain those indexes. It will take
a very long time to clear that data down to spinning disks, because the
writes cannot be effectively combined into long sequences (sometimes they
theoretically could be combined, but the kernel just fails to do a good job
of doing so).

Buy a good IO system, RAID with lots of disks, or maybe SSD, for your
indexes.

If the freezes occur mostly at checkpoint sync time, then you can try
making the checkpoint interval much longer. Checkpoints will still suck
when they do happen, but that happens less often. Depending on the
details of your system of your data and your loading processes, they might
freeze for N times longer if you make them N times less frequent, such that
the total amount of freezing time is conserved. Or they might freeze for
just the same period, so that total freezing time is reduced by a factor of
N. It is hard to know without trying it. You could also try lowering
the /proc/sys/vm/dirty_background_bytes setting, so that the kernel starts
writing things out *before* the end-of-checkpoint sync calls start landing.

If the freezes aren't correlated with checkpoints, you could try increasing
the shared_buffers to take up most of your RAM. This is unconventional
advice, but I've seen it do wonders for such loads when the indexes that
need maintenance are about the same size as RAM.

If you can partition your tables so that only one partition is being
actively loaded at a time, that could be very effective if the indexes for
each partition would then be small enough to fit in memory.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Albe Laurenz 2014-07-21 08:02:37 Re: Blocking every 20 sec while mass copying.
Previous Message Guillaume Cottenceau 2014-07-18 15:16:54 Re: Blocking every 20 sec while mass copying.