From: | "Robert Haas" <robertmhaas(at)gmail(dot)com> |
---|---|
To: | "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca> |
Cc: | "Joshua Tolley" <eggyknap(at)gmail(dot)com>, "Bryce Cutt" <pandasuit(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Date: | 2008-12-30 04:55:02 |
Message-ID: | 603c8f070812292055t32b850d9l5a9427eea91c2ed5@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I think that setting aside a minimum percentage of work_mem may be a
> reasonable approach. For instance, setting aside 1% at even 1 MB
> work_mem would be 10 KB which is enough to store about 40 MCV tuples of
> the TPC-H database. Such a small percentage would be very unlikely (but
> still possible) to change the number of batches used. Then, given the
> memory allocation and the known tuple size + overhead, only that number
> of MCVs are selected for the MCV table regardless how many there are.
> The MCV table size would then increase as work_mem is changed up to a
> maximum given by the number of MCVs.
Sounds fine. Maybe 2-3% would be better.
> The code when building the MCV hash table keeps track of the order of
> insertion of the best MCVs. It then flushes the MCV partitions in
> decreasing order of frequency of MCVs. Thus, by the end of the build
> partitioning phase the MCV hash table should only store the most
> frequent MCV tuples. Even with many-to-many joins as long as we keep
> all build tuples that have a given MCV in memory, then everything is
> fine. You would get into problems if you only flushed some of the
> tuples of a certain MCV but that will not happen.
OK, I'll read it again - I must not have understood.
It would be good to post an updated patch soon, even if not everything
has been addressed.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Bryce Cutt | 2008-12-30 05:29:06 | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Previous Message | Robert Haas | 2008-12-30 03:16:27 | Re: Documenting serializable vs snapshot isolation levels |