Re: large database

From: Tony CL Chan <tonychan(at)emblocsoft(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>, "pgsql-general(at)postgresql(dot)org general" <pgsql-general(at)postgresql(dot)org>
Cc: Tony CL Chan <tonychan(at)emblocsoft(dot)com>, Mihai Popa <mihai(at)lattica(dot)com>
Subject: Re: large database
Date: 2012-12-11 13:49:21
Message-ID: DD79AEAA-64B9-400B-BEB6-10D2F2319A76@emblocsoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

If you have big table you could also think about Hadoop/HBase or Cassandra but do not put large data set in MySQL. I agree with Bill that "Despite the fact that lots of people have been able to make it (MySQL) work" (me too, another example), there are issues with it. I have been using MySQL for a number of years, using it to handle large DBs with large number of users, the MySQL is the bottleneck, especially when running table joins for large data set, CPU and I/O load went up ......

If switching to PostgreSQL, PostgreSQL 9.1.x is very good choice for production deployment.

Thanks
Tony

P.S. Today I did some stress tests on my PostgreSQL staging server: a) insert 2 billions records into the test table, b) full scan the table. here are some test results:

Facts:
Number of records: 2 billions records inserted today
Full table scan: about 16.76 minutes to scan 2 billions of rows, really AMAZING!
Database size: 109GB
PostgrSQL: 9.2.1
Physical RAM: 8GB
CPU: i5

########

EXPLAIN ANALYZE SELECT COUNT(*) FROM test;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=33849559.60..33849559.61 rows=1 width=0) (actual time=1006476.308..1006476.309 rows=1 loops=1)
-> Seq Scan on test (cost=0.00..28849559.28 rows=2000000128 width=0) (actual time=47.147..903264.427 rows=2000000000 loops=1)
Total runtime: 1006507.963 ms

On 11 Dec 2012, at 8:27 PM, Bill Moran wrote:

> On Mon, 10 Dec 2012 15:26:02 -0500 (EST) "Mihai Popa" <mihai(at)lattica(dot)com> wrote:
>
>> Hi,
>>
>> I've recently inherited a project that involves importing a large set of
>> Access mdb files into a Postgres or MySQL database.
>> The process is to export the mdb's to comma separated files than import
>> those into the final database.
>> We are now at the point where the csv files are all created and amount
>> to some 300 GB of data.
>>
>> I would like to get some advice on the best deployment option.
>>
>> First, the project has been started using MySQL. Is it worth switching
>> to Postgres and if so, which version should I use?
>
> I've been managing a few large databases this year, on both PostgreSQL and
> MySQL.
>
> Don't put your data in MySQL. Ever. If you feel like you need to use
> something like MySQL, just go straight to a system that was designed with
> no constraints right off the bat, like Mongo or something.
>
> Don't put large amounts of data in MySQL. There are lots of issuse with it.
> Despite the fact that lots of people have been able to make it work (me,
> for example) it's a LOT harder to keep running well than it is on
> PostgreSQL. MySQL just isn't designed to deal with large data. As some
> examples: lack of CREATE INDEX CONCURRENTLY, the fact that the default
> configuration stores everything in a single file, the fact that any table
> changes (including simple things like adding a comment, or seemingly
> unrelated things like adding an index) require a complete table rebuild,
> and the fact that if you use anything other than INT AUTO_INCREMENT for
> your primary key you're liable to hit on awful inefficiencies.
>
> PostgreSQL has none of these problems.
>
> --
> Bill Moran <wmoran(at)potentialtech(dot)com>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2012-12-11 14:15:07 Re: Problem with aborting entire transactions on error
Previous Message Craig Ringer 2012-12-11 13:30:21 Re: Problem with aborting entire transactions on error