Quick Links

Re: Spilling hashed SetOps and aggregates to disk

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	David Gershuni <dgershun(at)cs(dot)cmu(dot)edu>
Cc:	Claudio Freire <klaussfreire(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Spilling hashed SetOps and aggregates to disk
Date:	2018-06-21 20:04:21
Message-ID:	1529611461.3820.55.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 2018-06-21 at 11:04 -0700, David Gershuni wrote:
> This approach seems functionally correct, but I don't like the idea
> of
> transforming O(N) tuples of disk I/O into O(S*N) tuples of disk I/O
> (in the worst case).

It's the same amount of I/O as the idea you suggested as putting the
hash tables into separate phases, but it may consume more disk space at
once in the worst case.

We can mitigate that if it becomes a problem later.

> b) is accomplished by not sorting groups by their actual grouping
> keys, but
> instead sorting by their hash values. This works because we don't
> need a
> true sort. We just need to process identical groups in a consistent
> order
> during the merge step. As long as we're careful about collisions
> during the
> merge, this should work.

Can you expand on how we should handle collisions? If all we can do is
hash and compare equality, then it seems complex to draw group
boundaries.

Regards,
Jeff Davis

In response to

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-21 18:04:27 from David Gershuni

Responses

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-21 20:44:56 from David Gershuni

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2018-06-21 20:27:30	Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context
Previous Message	Tom Lane	2018-06-21 20:00:49	PSA: --enable-coverage interferes with parallel query scheduling