GSoC project: K-medoids clustering in Madlib

From: Maxence AHLOUCHE <maxence(dot)ahlouche(at)gmail(dot)com>
To: atri(dot)jiit(at)gmail(dot)com, Rahul(dot)Iyer(at)emc(dot)com, Sujit(dot)Philip(at)emc(dot)com, devel(at)madlib(dot)net, pgsql-students(at)postgresql(dot)org
Subject: GSoC project: K-medoids clustering in Madlib
Date: 2013-04-15 21:18:50
Message-ID: CAJeaomXH1x3SaenmRPdWho9POMBZZmTsbM-iGJm03sH35BKYnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-students

Hi again!

I am "viod(dot)len(at)gmail(dot)com", but now writing with my "true" email address.

Note that I've sent this mail to both MADlib and PostgreSQL mailing lists,
in order to synchronize the efforts. I've also sent it to everyone that was
CC'ed in previous mails.
Where should I send mails regarding this project from now on? Sending on
both mailing lists seems like a quite bad idea.

As I had lost hope of getting an answer from MADlib, I have recontacted
Atri, as he was willing to mentor the MADlib projects.

Here is his answer:

Le 15 avr. 2013 17:46, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com> a écrit :
On Mon, Apr 15, 2013 at 9:12 PM, viod <viod(dot)len(at)gmail(dot)com> wrote:
> Hello all!
>
>
>>
>> Do you have any interest in data analytics? I proposed a couple of
>> ideas in that field.If you are interested, we could talk over them.
>>
>> Regards,
>>
>> Atri
>
>
> I am pretty much interested in the ideas you posted on the mailing list
> (particularly in implementing the K-medoids algorithm). I've asked MADlib
if
> they could mentor me, but unfortunately their org has not been accepted
in
> GSoC, and they haven't answered since then.

No issues, I can try to help you out.

> Do you think I could still do it with PostgreSQL? I would really like to
do
> this project, as an initiation to classification algorithms.
>
> Still, I don't really understand how this would be used by PostgreSQL?

MADLib is the de facto library for in database analytics in PostgreSQL.

Download,install MADLib, run a few programs, and think more about the
implementation and discuss here.

Regards,

Atri

And here is Rahul's message:

Hi Maxence,

Welcome aboard on MADlib development. We would happy to help you out in
adding K-mediods to the MADlib suite.

Do you have any idea of stuff I could do to get familiar with the code?
I've already been through the doc to search for functions I already knew,
and found a little bit, I'll go and read the code by the end of the week.

You could start off my looking the the Linear Regression code to understand
the workflow. Another document that would be useful to review is the design
doc (found here <http://madlib.net/design.pdf>). Chapter 1 gives an
overview of the Abstraction layer that is used by all modules.
(there are some bibtex errors that I debugging, but it's still readable)

Linear regression does not use the iterative constructs and is easier to
understand. You could then look at how k-means is implemented since
k-mediods would interplay with it in the final product.
I believe once we go through the k-mediods implementation, extending the
backprop code would be easy. So we could definitely look at that when we
get to it.

Sometimes we get busy enough to not be able to give a quick response on the
devel(at)madlib list but keep posting questions there (or ping when you don't
hear back) and we will be able to support you in this endeavor.

Best,
Rahul

By the way, my "ping" message didn't intend to look aggressive -- just in
case. My phone simply ate my question mark.

I'm also adding the presentation I had sent on MADlib's mailing list, so
that PostgreSQL's guys can also get a better idea of who I am:

I'm Maxence Ahlouche, and have now been studying IT for almost three years.
I've spent the first two years of my studies in the French equivalent of an
HND, a very technical training. After having obtained my diploma, I've
integrated an engineering school, as I wanted to learn more theorical
stuff, and understand better the tools I use every day.
My current training is actually called IT and Applied Mathematics (and I
currently have some difficulties in mathematics, as all the other students,
except for one, have done a very maths-intensive "preparatory course"
before coming here). Still, I'm really interested into what I learn, and am
very curious about many things.

At first, I wanted to apply for a PostgreSQL project, and, while lurking on
their mailing list, I found a reference to the aforementioned project about
K-medoids algorithm. I found this project in perfect fit with my centers of
interest: a teacher made me love databases and want to learn more about
their internals, and machine learning is a domain that's been attracting me
for a while now.

As to my skills, I've learnt lots of programming languages (not exhaustive
list: C, C++, Java, a bit of Matlab and Fortran, Bash, PHP, C#, VBA,
Python, Caml...). I know how to learn by myself and quickly. During my
courses, I've done a (very little) bit of classification: we had to
determine the zone in which a pixel belongs via their maximum likelihood.
This made me want to learn more about this domain.

That being said, I thank you all for your investment :)

--
Maxence Ahlouche
06 06 66 97 00
93 avenue Paul DOUMER
24100 Bergerac

Browse pgsql-students by date

  From Date Subject
Next Message Josh Berkus 2013-04-16 16:52:57 Re: Google Summer of code 2013
Previous Message viod 2013-04-15 16:45:14 Re: Google Summer of code 2013