From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | "Valter Douglas Lisbôa Jr(dot)" <douglas(at)trenix(dot)com(dot)br> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: High inserting by syslog |
Date: | 2008-07-03 16:08:26 |
Message-ID: | 486CF97A.8060800@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Valter Douglas Lisbôa Jr. wrote:
> Hello all, I have a perl script thats load a entire day squid log to a
> postgres table. I run it at midnight by cronjob and turns off the indexes
> before do it (turning it on after). The script works fine, but I want to
> change this to a diferent approach.
>
> I'd like to insert on the fly the log lines, so long it be generated to have
> the data on-line. But the table has some indexes and the load of lines is
> about 300.000/day, so the average inserting is 3,48/sec. I think this could
> overload the database server (i did not test yet), so if I want to create a
> no indexed table to receive the on-line inserting and do a job moving all
> lines to the main indexed table at midnight.
There are two things to bear in mind.
1. What you need to worry about is the peak rate of inserts, not the
average. Even at 30 rows/sec that's not too bad.
2. What will your system do if the database is taken offline for a
period? How will it catch up?
The limiting factor will be the speed of your disks. Assuming a single
disk (no battery-backed raid cache) you'll be limited to your RPM (e.g.
10,000 commits / minute). That will fall off rapidly if you only have
one disk and it's busy doing other reads/writes. But, if you batch many
log-lines together you need many less commits.
So - to address both points above, I'd use a script with a flexible
batch-size.
1. Estimate how many log-lines need to be saved to the database.
2. Batch together a suitable number of lines (1-1000) and commit them to
the database.
3. Sleep 1-10 secs
4. Back to #1, disconnect and reconnect every once in a while.
If the database is unavailable for any reason, this script will
automatically feed rows faster when it returns.
> My question is, Does exists a better solution, or this tatic is a good way to
> do this?
You might want to partition the table monthly. That will make it easier
to manage a few years from now.
http://www.postgresql.org/docs/current/static/ddl-partitioning.html
Also, consider increasing checkpoint_segments if you find the system
gets backed-up.
Perhaps consider setting synchronous_commit to off (but only for the
connection saving the log-lines to the database)
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Valter Douglas Lisbôa Jr. | 2008-07-03 16:23:24 | Re: High inserting by syslog |
Previous Message | Joshua D. Drake | 2008-07-03 16:03:49 | Re: High inserting by syslog |