Re: performance problem with loading data

From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Sergey Karin <sergey(dot)karin(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, PostGIS Users Discussion <postgis-users(at)postgis(dot)refractions(dot)net>
Subject: Re: performance problem with loading data
Date: 2007-06-11 10:26:55
Message-ID: 466D236F.60800@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sergey Karin wrote:
> Hi all.
>
> I use PG 8.1.8 and PostGIS 1.1.1
> vka1=# select version();

> I develop application for loading geodata to database. In separate
> transaction the application inserts the data in separate table that
> created dynamically when transaction started. All tables has equal
> structure.

> I found *VERY* strange problem: speed of the loading process is slow
> down (first 10000 objects are loaded in 69792 ms and last 10000 objects
> in 123737 ms). And futhermore, if I do NOT close current session but
> start new transaction, the first 10000 objects will be loaded in 192279
> ms and last 10000 objects in 251742 ms. And so on!! :-(
>
> But if I drop the trigger speed of loading process is NOT slow down.
>
> Who can explain me what I do incorrect?

I think I can explain what happens (to my understanding, I'm "just a user").

When you start you have an empty table (right?). If PG's statistics are
accurate to match that situation at that point, it will know there are 0
records in it. Querying a table with 0 records is fastest with a
sequential scan.

Next you put data into the table, and at a certain point a sequential
scan will not be optimal anymore. But PG doesn't know that, so it keeps
using sequential scans!

I think you'll see performance improve if you add ANALYZE <table>
statements periodically. That way PG updates its stats on that table.

I know my explanation is a bit simplified, there are other statistics in
play. That is why I didn't advise to run ANALYZE just once after a
certain number of inserts ;)

Now this would be problematic with INSERT ... SELECT, as there's no room
to run periodical ANALYZES (maybe if you'd put in LIMIT/OFFSET, but that
feels kludgy).

For the technical people; would it be possible to use the statistics on
the table(s) in the SELECT part of the statement to update the
statistics of the table being inserted into? Maybe they wouldn't be
entirely accurate, but it wouldn't it be more accurate than statistics
that say it's empty?

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

In response to

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2007-06-11 10:55:08 Re: track ddl changes on single database
Previous Message Rikard Pavelic 2007-06-11 10:19:40 track ddl changes on single database