Re: database design question, new to postgresql

From: Fei Liu <fei(dot)liu(at)aepnetworks(dot)com>
To: Fei Liu <fei(dot)liu(at)aepnetworks(dot)com>
Cc: PostgreSQL Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: database design question, new to postgresql
Date: 2007-04-27 20:51:00
Message-ID: 46326234.7000709@aepnetworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Fei Liu wrote:
> Fei Liu wrote:
>> Hello group, I need to design and develop a web reporting system to
>> let users query/view syslog files on a unix host. For now, I am
>> concentrating on the authentication file that has user logon
>> (success/failure) and logoff records. The log file is logrotated
>> every week or so. My reporting system parses the log entries and put
>> the result into a postgresql database (I am proposing to use
>> postgresql as the backend). Since this deals with multi-year archive
>> and I believe 'partitioing' is an ideal feature to handle this
>> problem. So here is the design scheme:
>>
>> CREATE TABLE logon_success(
>> name varchar(32) not null,
>> srcip inet not null,
>> date date not null,
>> time time not null,
>> ...
>> );
>>
>>
>> CREATE TABLE logon_success_yy${year}mm${month}(
>> CHECK (date >= DATE '$year-$month-01' AND date < DATE
>> '$next_year-$next_month-1')
>> )
>> INHERITS ($tname)
>> ;
>>
>> As you can see from the sample code, I am using perl to dynamically
>> generate children tables as I parse log files in a daily cron job
>> script. Once the log file is analyzed and archived in the database, I
>> have a simple web UI that sysadmin can select and view user logon
>> events. I have built a sample framework and it works so far. Keep in
>> mind, this reporting system is not limited to just user logon, it
>> should also work with system events such as services
>> failures/startup, hardware failures, etc
>>
>> Now here are my questions:
>> 1) Should I use database to implement such a reporting system? Are
>> there any alternatives, architects, designs?
>> 2) Is partitioning a good approach to speed up log query/view? The
>> user comment in partitioning in pgsql manual seems to indicate
>> partitioning may be slower than non-partitioned table under certain
>> circumstances.
>> 3) How to avoid repetitive log entry scanning since my cron job
>> script is run daily but logrotate runs weekly? This means everytime
>> my script will be parsing duplicate entries.
>> 4) When parsing log files, it's quite possible that there are
>> identical entries, for example a user logins really fast, resulting 2
>> or more identical entries..In this case can I still use primary
>> key/index at all? If I can, how do I design primary key or index to
>> speed up query?
>> 3) What are the most glaring limitations and flaws in my design?
>>
>> Thank you for taking time to review and answer my questions! Let me
>> know if I am not clear on any specific detail..
>>
>> Fei
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
> Let me add one more question, what are the best approaches to analyze
> postgresql query performance and how to improve postgresql query
> performance?
> Fei
My initial testing has not shown any significant difference between a
partitioning approach and a plain (all entries in master) database
approach...
2005-01-01 | 00:27:55 | firewood | ssh | Login Successful | None |
local | user9819 | 192.168.1.31

My test was based on two artificial tables that has 1700 records per day
from 2004-02-01 to 2007-04-27, around 2 million entries that are
identical in both tables.
My test script:
echo Testing database $t1 time based
time psql -p 5583 netilla postgres << EOF
select count(date) from $t1 where date > '2005-03-01' and date <
'2006-12-11';
\q
EOF

echo Testing database $t2 time based
time psql -p 5583 netilla postgres << EOF
select count(date) from $t2 where date > '2005-03-01' and date <
'2006-12-11';
\q
EOF

Result:
./timing_test.sh
Testing database logon_test time based
count
---------
1121472
(1 row)

0.00user 0.00system 0:02.92elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+456minor)pagefaults 0swaps
Testing database logon_test2 time based
count
---------
1121472
(1 row)

0.00user 0.00system 0:02.52elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+456minor)pagefaults 0swaps

But the numbers are really not static and logon_test2 (with
partitioning) sometimes behave worse than logon_test...
Fei

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Mark Deering 2007-04-28 03:27:25 need good editor for postgreSQL
Previous Message Fei Liu 2007-04-27 20:25:00 Re: database design question, new to postgresql