Re: Netflix Prize data

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>, pg(at)mohawksoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Netflix Prize data
Date: 2006-10-04 21:00:00
Message-ID: C1496EE0.372A%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark,

On 10/4/06 1:43 PM, "Mark Woodward" <pgsql(at)mohawksoft(dot)com> wrote:

> markw(at)snoopy:~/netflix$ time psql netflix -c "select count(*) from ratings"
> count
> -----------
> 100480507
> (1 row)
>
>
> real 2m6.270s
> user 0m0.004s
> sys 0m0.005s

I think you are getting about 40MB/s on your sequential scan of about 5GB of
heap data in this case. I calculate the size of the data as:

3 Integers (12 bytes), one text date field (10 bytes ?) and tuple overhead
(24 bytes) = 46 bytes per row

100 million rows x 46 bytes / row = 4.6 Gbytes

- Luke

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-10-04 21:00:52 Re: Netflix Prize data
Previous Message Mark Woodward 2006-10-04 20:43:42 Netflix Prize data