From: | Peeyush Agarwal <peeyushagarwal1994(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Log Data Analytics : Confused about the choice of Database |
Date: | 2014-05-13 02:11:02 |
Message-ID: | CA+-6-YJs_OQ=jqWZgy_Q-t2zVUEaGDOb-EH8W5BANOfPg3T-tw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I have log data of the following format:
Session Timestamp Event Parameters1 1
Started Session 1 2 Logged In
Username:"user1"2 3 Started Session1 3
Started Challenge title:"Challenge 1", level:"2"2
4 Logged In Username:"user2"
Now, a person wants to carry out analytics on this log data (And would like
to receive it as a JSON blob after appropriate transformations). For
example, he may want to receive a JSON blob where the Log Data is grouped
by Session and TimeFromSessionStart and CountOfEvents are added before the
data is sent so that he can carry out meaningful analysis. Here I should
return:
[
{
"session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session
Started"}, {"TimeFromSessionStart":1, "Event":"Logged In",
"Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd
Challenge", "title":"Challenge 1", "level":"2" }]
},
{
"session":2,
"CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session
Started"}, {"TimeFromSessionStart":2, "Event":"Logged In",
"Username":"user2"}]
}]
Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic
additional data] will not be hard coded and I will make a web interface to
allow the person to decide what kind of synthetic data he requires in the
JSON blob. I would like to provide a good amount of flexibility to the
person to decide what kind of synthetic data he wants in the JSON blob.
If I use PostgreSQL, I can store the data in the following manner: Session
and Event can be string, Timestamp can be date and Parameters can be
hstore(key
value pairs available in PostgreSQL). After that, I can use SQL queries to
compute the synthetic (or additional) data, store it temporarily in
variables in a Rails Application (which will interact with PostgreSQL
database and act as interface for the person who wants the JSON blob) and
create JSON blob from it.
However I am not sure if PostgreSQL is the best choice for this use case. I
have put the detailed question on SO at
http://stackoverflow.com/questions/23544604/log-data-analytics
Looking for some help from the community.
Peeyush Agarwal
From | Date | Subject | |
---|---|---|---|
Next Message | Borodin Vladimir | 2014-05-13 07:08:47 | Re: Question about synchronous replication |
Previous Message | Peeyush Agarwal | 2014-05-13 02:05:03 | Log Data Analytics : Confused about the choice of Database |