From: | jao(at)geophile(dot)com |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Insert rate drops as table grows |
Date: | 2006-01-31 15:28:04 |
Message-ID: | 20060131102804.fay5uo2dicoc0s84@geophile.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
[Sorry if this is a repeat. Thought I sent this but I didn't see
it come through the mailing list.]
Quoting Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> jao(at)geophile(dot)com writes:
>> I have this table and index:
>> create table t(id int, hash int);
>> create index idx_t on t(hash);
>
>> The value of the hash column, which is indexed, is a pseudo-random
>> number. I load the table and measure the time per insert.
>
>> What I've observed is that inserts slow down as the table grows to
>> 1,000,000 records. Observing the pg_stat* tables, I see that the data
>> page reads per unit time stay steady, but that index page reads grow
>> quickly, (shared_buffers was set to 2000).
>
> Define "quickly" ... the expected behavior is that cost to insert into
> a btree index grows roughly as log(N). Are you seeing anything worse
> than that?
No, that's not what I'm seeing. The index block reads start low, and
rise quickly to an approximate plateau. I've placed my test program
and results here: http://geophile.com/insert_slowdown.
- InsertPerformance.java: The test program (using the 8.0 JDBC driver
and a 7.4.8 database. The database and test are all running on my
laptop).
- block_reads.jpg: Graph of data and index block reads, as reported by
the pgstat_ tables, sampled every 15 seconds, (for a load of 1,000,000
rows).
- insert_rate_vs_inserts.jpg: Graph of insert rate as a function of
#rows inserted.
- insert_rate_vs_time.jpg: Graph of insert rate as a function of wall
clock time.
>
> shared_buffers of 2000 is generally considered too small for high-volume
> databases.
Understood. I set the value low to quickly test the idea that the
index cache hit rate was the issue.
> Numbers like 10000-50000 are considered reasonable on modern
> hardware.
These values are OK for 7.4.8? I've been using 8000. I thought I
remembered reading that 12000-15000 was the top end of what would be
reasonable, but I don't have a reference, and I don't think I've ever
heard a rationale for such limits.
Jack Orenstein
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2006-01-31 15:28:21 | Re: Can't get the field = ANY(array) clause to work... |
Previous Message | Tom Lane | 2006-01-31 15:22:11 | Re: 8.0.3 regexp_replace()... |