Quick Links

Re: Hadoop backend?

From:	pi song <pi(dot)songs(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hadoop backend?
Date:	2009-02-22 22:18:52
Message-ID:	1b29507a0902221418u4fcb57b9ub891b69efe516ccc@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

One more problem is that data placement on HDFS is inherent, meaning you
have no explicit control. Thus, you cannot place two sets of data which are
likely to be joined together on the same node = uncontrollable latency
during query processing.
Pi Song

On Mon, Feb 23, 2009 at 7:47 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sat, Feb 21, 2009 at 9:37 PM, pi song <pi(dot)songs(at)gmail(dot)com> wrote:
> > 1) Hadoop file system is very optimized for mostly read operation
> > 2) As of a few months ago, hdfs doesn't support file appending.
> > There might be a bit of impedance to make them go together.
> > However, I think it should a very good initiative to come up with ideas
> to
> > be able to run postgres on distributed file system (doesn't have to be
> > specific hadoop).
>
> In theory, I think you could make postgres work on any type of
> underlying storage you like by writing a second smgr implementation
> that would exist alongside md.c. The fly in the ointment is that
> you'd need a more sophisticated implementation of this line of code,
> from smgropen:
>
> reln->smgr_which = 0; /* we only have md.c at present */
>
> Logically, it seems like the choice of smgr should track with the
> notion of a tablespace. IOW, you might to have one tablespace that is
> stored on a magnetic disk (md.c) and another that is stored on your
> hypothetical distributed filesystem (hypodfs.c). I'm not sure how
> hard this would be to implement, but I don't think smgropen() is in a
> position to do syscache lookups, so probably not that easy.
>
> ...Robert
>

In response to

Re: Hadoop backend? at 2009-02-22 20:47:15 from Robert Haas

Responses

Re: Hadoop backend? at 2009-02-23 02:09:16 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Adriano Lange	2009-02-22 23:04:56	Re: graph representation of data structures in optimizer
Previous Message	Robert Haas	2009-02-22 20:47:15	Re: Hadoop backend?