Re: Postgres or Greenplum

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Simon Windsor <simon(dot)windsor(at)cornfield(dot)me(dot)uk>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres or Greenplum
Date: 2011-06-07 22:04:04
Message-ID: BANLkTikWi=sAD6-RoVQ4ZKnAMQhSYyeZDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jun 7, 2011 at 10:26 PM, Simon Windsor
<simon(dot)windsor(at)cornfield(dot)me(dot)uk> wrote:

> I have been using Postgres for many years and have recently discover
> Greenplum, which appears to be a heavily modify Postgres based, multi node
> DB that is VERY fast.
>
> All the tests that I have seen suggest that Greenplum when implemented on a
> single server, like Postgres, but with several  separate installations can
> be many time times faster than Postgres. This is achieved by using multiple
>  DBs to store the data and using multiple logger  and writer processes to
> fully use the all the resources of the server.
>
> Has the Postgres development team ever considered using this technique to
> split the data into separate sequential files that can be accessed by
> multiple writers/reader processes? If so, what was the conclusion?
>
> Finally,  thanks for all the good work over the years!

Yes, I've looked at implementing parallel query a number of times. My
estimate was that its about 2 man years effort to do something
worthwhile there, and so far nobody has offered funding for such a
task. There was some recent discussion about obtaining funding
recently, so we'll see how that goes. It is of course reasonably
straightforward to achieve trivial parallelism, but that's mostly
useless in the real world. So its on the roadmap, but some way off
yet.

Many commercial implementations exist, and IMHO the Greenplum solution
is the best general purpose DW solution currently available for
PostgreSQL-like environments. Greenplum does have a community edition
that is free to use and your stated performance results match my
experience. We've worked with a number of data warehouse customers
hitting the limits and moving up to Greenplum. Once people give up the
Oracle mantra, it frees them to consider a range of alternatives.

Main reasons for deferring work on parallel query has been that other
techniques have been easier to achieve useful gains with. For example,
partitioning allowed PostgreSQL to dramatically reduce scan times with
less complexity. Synchronous scans can also achieve good efficiencies
for cases where total throughput is important. I expect to do more
work on improving decision support query performance in the next
release (9.2), so if anybody wishes to partially fund development that
would be much appreciated.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vibhor Kumar 2011-06-07 22:25:23 Re: maximum size limit for a query string?
Previous Message Tom Lane 2011-06-07 21:52:09 Re: Postgres or Greenplum