Fei Liu wrote:
> Hello group, I need to design and develop a web reporting system to
> let users query/view syslog files on a unix host. For now, I am
> concentrating on the authentication file that has user logon
> (success/failure) and logoff records. The log file is logrotated every
> week or so. My reporting system parses the log entries and put the
> result into a postgresql database (I am proposing to use postgresql as
> the backend). Since this deals with multi-year archive and I believe
> 'partitioing' is an ideal feature to handle this problem. So here is
> the design scheme:
>
> CREATE TABLE logon_success(
> name varchar(32) not null,
> srcip inet not null,
> date date not null,
> time time not null,
> ...
> );
>
>
> CREATE TABLE logon_success_yy${year}mm${month}(
> CHECK (date >= DATE '$year-$month-01' AND date < DATE
> '$next_year-$next_month-1')
> )
> INHERITS ($tname)
> ;
>
> As you can see from the sample code, I am using perl to dynamically
> generate children tables as I parse log files in a daily cron job
> script. Once the log file is analyzed and archived in the database, I
> have a simple web UI that sysadmin can select and view user logon
> events. I have built a sample framework and it works so far. Keep in
> mind, this reporting system is not limited to just user logon, it
> should also work with system events such as services failures/startup,
> hardware failures, etc
>
> Now here are my questions:
> 1) Should I use database to implement such a reporting system? Are
> there any alternatives, architects, designs?
> 2) Is partitioning a good approach to speed up log query/view? The
> user comment in partitioning in pgsql manual seems to indicate
> partitioning may be slower than non-partitioned table under certain
> circumstances.
> 3) How to avoid repetitive log entry scanning since my cron job script
> is run daily but logrotate runs weekly? This means everytime my script
> will be parsing duplicate entries.
> 4) When parsing log files, it's quite possible that there are
> identical entries, for example a user logins really fast, resulting 2
> or more identical entries..In this case can I still use primary
> key/index at all? If I can, how do I design primary key or index to
> speed up query?
> 3) What are the most glaring limitations and flaws in my design?
>
> Thank you for taking time to review and answer my questions! Let me
> know if I am not clear on any specific detail..
>
> Fei
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
Let me add one more question, what are the best approaches to analyze
postgresql query performance and how to improve postgresql query
performance?
Fei