Log Data Analytics : Confused about the choice of Database

From: Peeyush Agarwal <peeyusha(at)iitk(dot)ac(dot)in>
To: pgsql-general(at)postgresql(dot)org
Subject: Log Data Analytics : Confused about the choice of Database
Date: 2014-05-13 02:05:03
Message-ID: CA+-6-YJq69Py7hsFGL=4k5wKXsa5UsNEkvRfmrfxstfiUYDaAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have log data of the following format:

Session Timestamp Event Parameters1 1
Started Session 1 2 Logged In
Username:"user1"2 3 Started Session1 3
Started Challenge title:"Challenge 1", level:"2"2
4 Logged In Username:"user2"

Now, a person wants to carry out analytics on this log data (And would like
to receive it as a JSON blob after appropriate transformations). For
example, he may want to receive a JSON blob where the Log Data is grouped
by Session and TimeFromSessionStart and CountOfEvents are added before the
data is sent so that he can carry out meaningful analysis. Here I should
return:

[
{
"session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session
Started"}, {"TimeFromSessionStart":1, "Event":"Logged In",
"Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd
Challenge", "title":"Challenge 1", "level":"2" }]
},
{
"session":2,
"CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session
Started"}, {"TimeFromSessionStart":2, "Event":"Logged In",
"Username":"user2"}]
}]

Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic
additional data] will not be hard coded and I will make a web interface to
allow the person to decide what kind of synthetic data he requires in the
JSON blob. I would like to provide a good amount of flexibility to the
person to decide what kind of synthetic data he wants in the JSON blob.

If I use PostgreSQL, I can store the data in the following manner: Session
and Event can be string, Timestamp can be date and Parameters can be
hstore(key
value pairs available in PostgreSQL). After that, I can use SQL queries to
compute the synthetic (or additional) data, store it temporarily in
variables in a Rails Application (which will interact with PostgreSQL
database and act as interface for the person who wants the JSON blob) and
create JSON blob from it.

However I am not sure if PostgreSQL is the best choice for this use case. I
have put the detailed question on SO at
http://stackoverflow.com/questions/23544604/log-data-analytics

Looking for some help from the community.

Peeyush Agarwal

Browse pgsql-general by date

  From Date Subject
Next Message Peeyush Agarwal 2014-05-13 02:11:02 Log Data Analytics : Confused about the choice of Database
Previous Message Adrian Klaver 2014-05-12 18:26:30 Re: Question about synchronous replication