Quick Links

Re: count distinct slow?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Roger Pack <rogerdpack2(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: count distinct slow?
Date:	2014-11-18 01:26:24
Message-ID:	8012.1416273984@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Roger Pack <rogerdpack2(at)gmail(dot)com> writes:
> As a note, I ran into the following today (doing a select distinct is fast,
> doing a count distinct is significantly slower?)

The planner appears to prefer hash aggregation for the variants of
your query wherein the DISTINCT becomes a separate plan step. This
is evidently a good choice, with only 6192 distinct values (hence
just that many hash table entries) in 7495551 input rows. However,
COUNT(DISTINCT), or any other aggregate with a DISTINCT tag, uses
sort-then-remove-adjacent-duplicates logic for DISTINCT. That's
evidently a good deal slower for your data set; most likely the data
doesn't fit in your work_mem setting so the sort spills to disk.

The reason DISTINCT aggregates don't consider hash aggregation is
partly lack of round tuits but mostly that an aggregate needs to
execute in a fairly limited amount of memory, and we can't be sure
that the hash table wouldn't get unreasonably large.

regards, tom lane

In response to

count distinct slow? at 2014-11-17 23:18:24 from Roger Pack

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jonathan Vanasco	2014-11-18 01:54:38	Re: String searching
Previous Message	Roger Pack	2014-11-17 23:18:24	count distinct slow?