Quick Links

Re: really bad insert performance on table with unique index

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: really bad insert performance on table with unique index
Date:	2012-02-02 20:59:40
Message-ID:	CAMkU=1yX59KMciVTMU=8vozE1XLJ-aPhFrXqmoBiD43iO1WhHA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thu, Feb 2, 2012 at 9:28 AM, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:
> I created a table with two columns: an id SERIAL (primary key) and a
> text (not null), and then added a unique index on the text field.
> Then I ran the following query (with a huge work_mem - 20GB):
>
> insert into tableA (text_field) select distinct other_text_field from
> some_huge_set_of_tables

I bet the distinct is being implemented by a hashAggregate. So then
you are inserting the records in a random order, causing the index to
have terrible locality of reference.

Try adding "order by other_text_field" to the select. Or don't create
the index until afterwards

>
> After 36 hours it had only written 3 GB (determined by looking at what
> files it was writing to).
> I started over with a TRUNCATE, and then removed the index and tried again.
> This time it took 3807270.780 ms (a bit over an hour).
> Total number of records: approx 227 million, comprising 16GB of storage.
>
> Why the huge discrepancy?

Maintaining indices when rows are inserted in a random order generates
a huge amount of scattered I/O.

Cheers,

Jeff

In response to

*really* bad insert performance on table with unique index at 2012-02-02 17:28:09 from Jon Nelson

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Gudmundur Johannesson	2012-02-03 08:22:01	Re: Index with all necessary columns - Postgres vs MSSQL
Previous Message	Merlin Moncure	2012-02-02 20:30:54	Re: Index with all necessary columns - Postgres vs MSSQL

Quick Links

Re: *really* bad insert performance on table with unique index

In response to

Browse pgsql-performance by date

Re: really bad insert performance on table with unique index