On Fri, Feb 4, 2011 at 2:18 PM, Mark Stosberg <mark(at)summersault(dot)com> wrote:
> It looks like it's going to be trivial-- Divide up the data with a
> modulo, and run multiple parallel cron scripts that each processes a
> slice of the data. A benchmark showed that this approach sped up our
> processing 3x when splitting the application 4 ways across 4 processors.
> (I think we failed to achieve a 4x improvement because the server was
> already busy handling some other tasks).
I once had about 2 months of machine work ahead of me for one server.
Luckily it was easy to break up into chunks and run it on all the
workstations at night in the office, and we were done in < 1 week.
pgsql was the data store for it, and it was just like what you're
talking about, break it into chunks, spread it around.