From: | "John D(dot) Burger" <john(at)mitre(dot)org> |
---|---|
To: | Dan Armbrust <daniel(dot)armbrust(dot)list(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Slow Inserts on 1 table? |
Date: | 2005-08-02 14:32:08 |
Message-ID: | 77cc8ed5b6d86115fed50515c82055f2@mitre.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> my guess is because analyze has not been run yet, so it thinks all of
> the tables are size 0. If I let it run for a while, then kill the
> load process, run Analyze, empty the tables, and then restart, things
> perform fine. But that is kind of a ridiculous sequence to have to
> use to load a database.
So automate it. After discovering exactly this behavior, I've
developed an idiom for load scripts where I (optionally) commit at some
linear interval, and (optionally) analyze at some exponential interval.
I presume this has been invented countless times, but here's my basic
idea in pseudo-code:
commitInterval = 1000
analyzeFactor = 2
whenToCommit = whenToAnalyze = commitInterval
nInserts = 0
loop over input data
if we decide to insert
insert
nInserts++
if whenToCommit < nInserts
commmit
whenToCommit += commitInterval
if whenToAnalyze < nInserts
analyze
whenToAnalyze *= 2
....
So (with these constants) we commit after 1000 total inserts, then
after 2000, 3000, etc. And we analyze after 1000 inserts, then after
2000, 4000, etc. This is perhaps way too conservative - in particular,
I suspect that it's only the first one or two analyzes that matter -
but it works for me.
The only annoyance is that the interface I use most often, Python's
pgdb, runs everything in a transaction, and you can't analyze in a
transaction. I've gotten around this in a variety of ways, some less
principled than others.
- John D. Burger
MITRE
From | Date | Subject | |
---|---|---|---|
Next Message | Victor | 2005-08-02 14:32:37 | Cost problem |
Previous Message | Richard Huxton | 2005-08-02 14:18:48 | Re: Slow Inserts on 1 table? |