Re: Log Data Analytics : Confused about the choice of Database

From: Peeyush Agarwal <peeyushagarwal1994(at)gmail(dot)com>
To: Dorian Hoxha <dorian(dot)hoxha(at)gmail(dot)com>
Cc: PostgreSql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Log Data Analytics : Confused about the choice of Database
Date: 2014-05-14 04:48:37
Message-ID: CA+-6-YJOmFFCdtHQ1UMV=sW=HM8Mq0LyjRLCOf9ABL_+BOdAEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Thanks for the reply :)

Yes, Storing timestamp as timestamp [ (p) ] would be better. I simplified
the session in question. It may contain alphabets as well. So, I will
probably need to store it as a string only.

The problem with types of events is that it is not fixed and will keep
increasing over time (as more event types are added depending on the need).
Would you still recommend saving a map in another table? Will it have a
significant storage benefit and are there any other benefits? Storing it in
the app will require some addition in code each time a new event type is
added which is not difficult but time consuming and tedious.

I am not very familiar with json data type. Can I query on it as
effectively as hstore? Also, is it possible to index it partially so as to
increase query speeds for certain types of queries? For eg. I would require
parameter username in a lot of queries so I would like to partially index
over it.

Peeyush Agarwal

On Tue, May 13, 2014 at 3:13 PM, Dorian Hoxha <dorian(dot)hoxha(at)gmail(dot)com>wrote:

> Why not store session as integer?
>
> And timestamp as timesamp(z?) ?
>
> If you know the types of events, also store them as integer , and save a
> map of them in the app or on another table ?
>
> And save the parameters as a json column, so you have more data-types?
> Hstore only has strings.
>
> Be carefull with the mongodb hipster on the stackoverflow post.
> Elasticsearch is often used for log collection.
>
> So, what really is the problem ?
>
>
>
> On Tue, May 13, 2014 at 4:11 AM, Peeyush Agarwal <
> peeyushagarwal1994(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> I have log data of the following format:
>>
>> Session Timestamp Event Parameters1 1 Started Session 1 2 Logged In Username:"user1"2 3 Started Session1 3 Started Challenge title:"Challenge 1", level:"2"2 4 Logged In Username:"user2"
>>
>> Now, a person wants to carry out analytics on this log data (And would
>> like to receive it as a JSON blob after appropriate transformations). For
>> example, he may want to receive a JSON blob where the Log Data is grouped
>> by Session and TimeFromSessionStart and CountOfEvents are added before
>> the data is sent so that he can carry out meaningful analysis. Here I
>> should return:
>>
>> [
>> {
>> "session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd Challenge", "title":"Challenge 1", "level":"2" }]
>> },
>> {
>> "session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}]
>> }]
>>
>>
>> Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic
>> additional data] will not be hard coded and I will make a web interface to
>> allow the person to decide what kind of synthetic data he requires in the
>> JSON blob. I would like to provide a good amount of flexibility to the
>> person to decide what kind of synthetic data he wants in the JSON blob.
>>
>> If I use PostgreSQL, I can store the data in the following manner:
>> Session and Event can be string, Timestamp can be date and Parameters can
>> be hstore(key value pairs available in PostgreSQL). After that, I can
>> use SQL queries to compute the synthetic (or additional) data, store it
>> temporarily in variables in a Rails Application (which will interact with
>> PostgreSQL database and act as interface for the person who wants the JSON
>> blob) and create JSON blob from it.
>>
>> However I am not sure if PostgreSQL is the best choice for this use case.
>> I have put the detailed question on SO at
>> http://stackoverflow.com/questions/23544604/log-data-analytics
>>
>> Looking for some help from the community.
>>
>> Peeyush Agarwal
>>
>
>

--
Peeyush Agarwal
IIT Kanpur
+91 8953453689

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jürgen Fuchsberger 2014-05-14 06:17:05 LDAP authentication not working
Previous Message Jesus Rafael Sanchez Medrano 2014-05-13 23:00:23 Re: Full-Text Search question