Quick Links

Re: Proposed Patch to Improve Performance ofMulti-BatchHash Join for Skewed Data Sets

From:	"Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
To:	"Joshua Tolley" <eggyknap(at)gmail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Bryce Cutt" <pandasuit(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Proposed Patch to Improve Performance ofMulti-BatchHash Join for Skewed Data Sets
Date:	2009-02-26 17:08:34
Message-ID:	6EEA43D22289484890D119821101B1DF2C199C@exchange20.mercury.ad.ubc.ca
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> They're automatically generated by the dbgen utility, a link to which
> was originally published somewhere in this thread. That tool creates a
> few text files suitable (with some tweaking) for a COPY command. I've
> got the original files... the .tbz I just made is 1.8 GB :) Anyone
have
> someplace they'd like me to drop it?

Just a note that the Z7 data set is really a uniform data set Z0. The
generator only accepts skew in the range from Z0 to Z4. The uniform,
Z0, data set is typically used when benchmarking data warehouses.

It turns out the data is not perfectly uniform as the top 100 suppliers
and products represent 2.3% and 1.5% of LineItem. This is just enough
skew that the optimization will sometimes be triggered in the
multi-batch case (currently 1% skew is the cutoff).

I have posted a pg_dump of the TPCH 1G Z0 data set at:

http://people.ok.ubc.ca/rlawrenc/tpch1g0z.zip

(Note that ownership commands are in the dump and make sure to vacuum
analyze after the load.) I can also post the input text files if that
is easier.

--
Ramon Lawrence

In response to

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets at 2009-02-26 03:24:21 from Robert Haas

Responses

Re: Proposed Patch to Improve Performance ofMulti-BatchHash Join for Skewed Data Sets at 2009-02-26 17:25:52 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2009-02-26 17:19:47	Re: Hot standby, recovery infra
Previous Message	Andrew Dunstan	2009-02-26 17:07:56	Re: xpath processing brain dead