Re: weird execution plan

From: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: weird execution plan
Date: 2014-09-12 21:34:28
Message-ID: 1410557668457-5818905.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Huang, Suya wrote
> Both queries have been run several times so cache would have same effect
> on both of them? Below is the plan with buffer information.

Not everyone does so its nice to make certain - especially since I'm not all
that familiar with the code involved. But since no one else has answered I
will theorize.

SELECT count(*) FROM ( SELECT DISTINCT col FROM tbl )

vs

SELECT count(DISTINCT col) FROM tbl

The code for "SELECT DISTINCT col" is likely highly efficient because it
works on complete sets of records.

The code for "SELECT count(DISTINCT col)" is at a relative disadvantage
since it must evaluate one row at a time and remember whether it had seen
the same value previously before deciding whether to increment a counter.

With a large number of duplicate rows the process of making the row set
smaller before counting the end result will perform better since fewer rows
must be evaluated in the less efficient count(DISTINCT) expression - the
time saved there more than offset by the fact that you are effectively
passing over that subset of the data a second time.

HashAggregate(1M rows) + Aggregate(200k rows) < Aggregate(1M rows)

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/weird-execution-plan-tp5818730p5818905.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Huang, Suya 2014-09-15 00:39:39 Re: weird execution plan
Previous Message Huang, Suya 2014-09-12 03:41:08 Re: weird execution plan