Re: GSoC project: K-medoids clustering in Madlib

From: Maxence AHLOUCHE <maxence(dot)ahlouche(at)gmail(dot)com>
To: "Iyer, Rahul" <Rahul(dot)Iyer(at)emc(dot)com>
Cc: "atri(dot)jiit(at)gmail(dot)com" <atri(dot)jiit(at)gmail(dot)com>, "pgsql-students(at)postgresql(dot)org" <pgsql-students(at)postgresql(dot)org>, "devel(at)madlib(dot)net" <devel(at)madlib(dot)net>, "Philip, Sujit" <Sujit(dot)Philip(at)emc(dot)com>
Subject: Re: GSoC project: K-medoids clustering in Madlib
Date: 2013-04-20 10:38:47
Message-ID: CAJeaomXfCTcLz2=pNCMU3qRSrsG16WOr9eWrR5VvinEz7UkgVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-students

Hi all!

I've had a bit of fun with the k-means clustering, and have made a small
script to visualize the result of the classification.
However, I couldn't guess how to assign a cluster to a point from the
output of the algorithm, could someone give me an indication, please?

My script is written in python3, and uses py-postgresql (
http://python.projects.pgfoundry.org/) as PostgreSQL interface. It also
requires Pillow (a PIL fork) which you can find here :
https://pypi.python.org/pypi/Pillow/2.0.0.

Before your first use, you may want to change the settings (on top of the
file) to connect to your PostgreSQL server.
The script will create a table in your database, populate it with random
groups of points, and then call the k-means algorithm on it. Finally, it
will generate a PNG image, displaying the points and the centroids.

For a first run, use something like this:
./k-means_test.py --regen -o clustered_data.png

You can call "./k-means_test.py -h" for a list of available options.

In attachment are my script and an example of its output.

By the way, I'll have a lot of work next week, as I have several exams
coming and a big project to do (about empirical orthogonal functions), so
I'll probably be inactive for a few days! Then I'll be on holidays, so I
will be able to focus on MADlib and GSoC :)

Regards,
Maxence

2013/4/19 Iyer, Rahul <Rahul(dot)Iyer(at)emc(dot)com>

> Hi Akansha,
>
> I am confused about the question - MADlib is open-source and
> available from Github. If you're having trouble in fork/clone or have a
> specific question about a module, we would be glad to help you. Please be
> specific about your question.
>
> - Rahul
> ---------------------------------------------------------
> *Rahul Iyer
> *Senior Software Engineer | Predictive Analytics
> rahul(dot)iyer(at)emc(dot)com
>
> On Apr 19, 2013, at 3:13 AM, Akansha Singh wrote:
>
> Hi, MADLib guys, Any Updates..? On my Part I am trying to understand the
> modules placed in Github .I a trying to get hands on it.
> http://madlib.net/ https://github.com/madlib/madlib/
>
>
>

--
Maxence Ahlouche
06 06 66 97 00
93 avenue Paul DOUMER
24100 Bergerac

Attachment Content-Type Size
k-means_test.py application/octet-stream 5.8 KB

In response to

Responses

Browse pgsql-students by date

  From Date Subject
Next Message Maxence AHLOUCHE 2013-04-20 10:41:18 Re: GSoC project: K-medoids clustering in Madlib
Previous Message Akansha Singh 2013-04-19 10:13:34 Re: GSoC project: K-medoids clustering in Madlib