From: | "Joshua Tolley" <eggyknap(at)gmail(dot)com> |
---|---|
To: | "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca> |
Cc: | pgsql-hackers(at)postgresql(dot)org, "Bryce Cutt" <pandasuit(at)gmail(dot)com> |
Subject: | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets |
Date: | 2008-11-03 00:41:55 |
Message-ID: | e7e0a2570811021641s560a7c27r6816946e766102f3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Nov 2, 2008 at 4:48 PM, Lawrence, Ramon <ramon(dot)lawrence(at)ubc(dot)ca> wrote:
> Joshua,
>
> Thank you for offering to review the patch.
>
> The easiest way to test would be to generate your own TPC-H data and
> load it into a database for testing. I have posted the TPC-H generator
> at:
>
> http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip
>
> The generator can produce skewed data sets. It was produced by
> Microsoft Research.
>
> After unzipping, on a Windows machine, you can just run the command:
>
> dbgen -s 1 -z 1
>
> This will produce a TPC-H database of scale 1 GB with a Zipfian skew of
> z=1. More information on the generator is in the document README-S.DOC.
> Source is provided for the generator, so you should be able to run it on
> other operating systems as well.
>
> The schema DDL is at:
>
> http://people.ok.ubc.ca/rlawrenc/tpch_pg_ddl.txt
>
> Note that the load time for 1G data is 1-2 hours and for 10G data is
> about 24 hours. I recommend you do not add the foreign keys until after
> the data is loaded.
>
> The other alternative is to do a pgdump on our data sets. However, the
> download size would be quite large, and it will take a couple of days
> for us to get you the data in that form.
>
> --
> Dr. Ramon Lawrence
> Assistant Professor, Department of Computer Science, University of
> British Columbia Okanagan
> E-mail: ramon(dot)lawrence(at)ubc(dot)ca
I'll try out the TPC-H generator first :) Thanks.
- Josh
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2008-11-03 01:10:17 | Re: Simple postgresql.conf wizard |
Previous Message | Lawrence, Ramon | 2008-11-02 23:48:36 | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets |