Quick Links

Re: Netflix Prize data

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Mark Woodward" <pgsql(at)mohawksoft(dot)com>, pg(at)mohawksoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Netflix Prize data
Date:	2006-10-04 21:00:00
Message-ID:	C1496EE0.372A%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Mark,

On 10/4/06 1:43 PM, "Mark Woodward" <pgsql(at)mohawksoft(dot)com> wrote:

> markw(at)snoopy:~/netflix$ time psql netflix -c "select count(*) from ratings"
> count
> -----------
> 100480507
> (1 row)
>
>
> real 2m6.270s
> user 0m0.004s
> sys 0m0.005s

I think you are getting about 40MB/s on your sequential scan of about 5GB of
heap data in this case. I calculate the size of the data as:

3 Integers (12 bytes), one text date field (10 bytes ?) and tuple overhead
(24 bytes) = 46 bytes per row

100 million rows x 46 bytes / row = 4.6 Gbytes

- Luke

In response to

Netflix Prize data at 2006-10-04 20:43:42 from Mark Woodward

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2006-10-04 21:00:52	Re: Netflix Prize data
Previous Message	Mark Woodward	2006-10-04 20:43:42	Netflix Prize data