Re: Insert performance and multi-column index order

From: Bob Lunney <bob_lunney(at)yahoo(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Insert performance and multi-column index order
Date: 2009-06-30 12:31:43
Message-ID: 387856.59902.qm@web39707.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Greg,

Thanks for the mental prod! Yes, the original data is more closely sorted by the timestamptz column, since they represent events coming into the collection system in real time. As for the distribution of data values, it goes without saying the timestamptz value is monotonically increasing, with roughly 1300 entries having the same timestamptz value. The other three columns' values are essentially reference data, with 400 values for the varchar, 680 for the first text column, and 60 for the second text column. The distribution is fairly even, with some small spikes but nothing significant.

The "duh" moment came for me when you pointed out the implicit sort order of the data. After resorting the data into the new index column order the insert performance was largely restored. I didn't monitor the process with vmstat, however - the end result is good enough for me. I believe that the index maintenance of page splitting, etc., that you describe below was exactly the culprit, and that presorting the data solved that problem.

I call it my "duh" moment since I've presorted data for Sybase and Oracle for exactly the same reason, but forgot to apply the lesson to PostgreSQL.

BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it matters.

Thanks for the help, Greg and Tom!

--- On Sat, 6/27/09, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

> From: Greg Smith <gsmith(at)gregsmith(dot)com>
> Subject: Re: [PERFORM] Insert performance and multi-column index order
> To: bob_lunney(at)yahoo(dot)com
> Cc: pgsql-performance(at)postgresql(dot)org
> Date: Saturday, June 27, 2009, 1:08 AM
> On Fri, 26 Jun 2009, bob_lunney(at)yahoo(dot)com
> wrote:
>
> > The original unique index was in the order
> (timestamptz, varchar, text, text) and most queries against
> it were slow.  I changed the index order to (varchar, text,
> timestamptz, text) and queries now fly, but loading data
> (via copy from stdin) in the table is 2-4 times slower.
>
> Is the input data closer to being sorted by the timestamptz
> field than the varchar field?  What you might be seeing
> is that the working set of index pages needed to keep
> building the varchar index are bigger or have more of a
> random access component to them as they spill in and out of
> the buffer cache.  Usually you can get a better idea
> what the difference is by comparing the output from vmstat
> while the two are loading.  More random read/write
> requests in the mix will increase the waiting for I/O
> percentage while not increasing the total amount
> read/written per second.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com
> http://www.gregsmith.com Baltimore, MD

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2009-06-30 13:19:11 Re: Utilizing multiple cores in a function call.
Previous Message Hartman, Matthew 2009-06-30 12:30:24 Re: Utilizing multiple cores in a function call.