Quick Links

Re: iceberg queries

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Christoph Haller <ch(at)rodos(dot)fzk(dot)de>
Cc:	pgsql-sql(at)postgresql(dot)org, wweng(at)kencast(dot)com
Subject:	Re: iceberg queries
Date:	2003-02-04 18:06:20
Message-ID:	3E40011C.94F11C92@Yahoo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Christoph Haller wrote:
>
> >
> > Does PostgreSQL optimizer handle iceberg queries well?
> >
> What do you mean by "iceberg query" ?
> I've never heard this term.

Iceberg queries compute one or more aggregate functions to find
aggregate values above a specified threshold. A typical iceberg query
would be

SELECT a, count(a)
FROM tab
GROUP BY a
HAVING count(a) >= 100;

This base form can easily be made more complicated by doing self joins
and the like. This type of query is often found in market research, data
warehousing and search engines.

As to the original question, if an index is available that returns the
rows in the sort order of the GROUP BY clause, PostgreSQL defaults to an
index scan, otherwise it will do a sort of the rows matching an optional
WHERE clause. This sorted set is then grouped and aggregated and
filtered by the HAVING clause after aggregation.

It is well known that this approach does not scale well for large data
sets. But in contrast to a specialized statistical software, PostgreSQL
has to answer the query precisely. So sampling or bucket methods aren't
options.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Re: iceberg queries at 2003-02-04 08:39:53 from Christoph Haller

Responses

Re: iceberg queries at 2003-02-04 18:37:09 from Tom Lane

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Tom Lane	2003-02-04 18:37:09	Re: iceberg queries
Previous Message	Jan Wieck	2003-02-04 17:36:03	Re: pg_views