Re: Log Data Analytics : Confused about the choice of Database

From: Dorian Hoxha <dorian(dot)hoxha(at)gmail(dot)com>
To: Peeyush Agarwal <peeyushagarwal1994(at)gmail(dot)com>
Cc: PostgreSql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Log Data Analytics : Confused about the choice of Database
Date: 2014-05-13 09:43:27
Message-ID: CANsFX07apqX5v1CO_n88DJc46s193sRu9t__r4R1r4LnyRG+Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Why not store session as integer?

And timestamp as timesamp(z?) ?

If you know the types of events, also store them as integer , and save a
map of them in the app or on another table ?

And save the parameters as a json column, so you have more data-types?
Hstore only has strings.

Be carefull with the mongodb hipster on the stackoverflow post.
Elasticsearch is often used for log collection.

So, what really is the problem ?

On Tue, May 13, 2014 at 4:11 AM, Peeyush Agarwal <
peeyushagarwal1994(at)gmail(dot)com> wrote:

> Hi,
>
> I have log data of the following format:
>
>
> Session Timestamp Event Parameters1 1 Started Session 1 2 Logged In Username:"user1"2 3 Started Session1 3 Started Challenge title:"Challenge 1", level:"2"2 4 Logged In Username:"user2"
>
> Now, a person wants to carry out analytics on this log data (And would
> like to receive it as a JSON blob after appropriate transformations). For
> example, he may want to receive a JSON blob where the Log Data is grouped
> by Session and TimeFromSessionStart and CountOfEvents are added before
> the data is sent so that he can carry out meaningful analysis. Here I
> should return:
>
>
> [
> {
> "session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd Challenge", "title":"Challenge 1", "level":"2" }]
> },
> {
> "session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}]
> }]
>
>
> Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic
> additional data] will not be hard coded and I will make a web interface to
> allow the person to decide what kind of synthetic data he requires in the
> JSON blob. I would like to provide a good amount of flexibility to the
> person to decide what kind of synthetic data he wants in the JSON blob.
>
> If I use PostgreSQL, I can store the data in the following manner: Session
> and Event can be string, Timestamp can be date and Parameters can be
> hstore(key value pairs available in PostgreSQL). After that, I can use
> SQL queries to compute the synthetic (or additional) data, store it
> temporarily in variables in a Rails Application (which will interact with
> PostgreSQL database and act as interface for the person who wants the JSON
> blob) and create JSON blob from it.
>
> However I am not sure if PostgreSQL is the best choice for this use case.
> I have put the detailed question on SO at
> http://stackoverflow.com/questions/23544604/log-data-analytics
>
> Looking for some help from the community.
>
> Peeyush Agarwal
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oliver Kohll - Mailing Lists 2014-05-13 09:44:59 Natural key woe
Previous Message Borodin Vladimir 2014-05-13 07:08:47 Re: Question about synchronous replication